pdftract/notes/pdftract-4h06h.md
jedarden ea1184168d test(pdftract-4h06h): implement TH-02 path traversal security test
Implement comprehensive path-traversal security tests documenting
the 10 canonical payloads from the threat model (plan line 891).

The test suite verifies that the resolve_path function in
mcp/root.rs properly rejects path-traversal attempts when --root
mode is enabled, while allowing HTTPS URLs to bypass validation
per INV-10.

Test coverage:
- All 10 traversal payloads rejected when --root is set
- Valid paths within root are accepted
- HTTPS URLs bypass root check
- Symlink escapes are caught
- URL-encoded traversal is rejected
- Special filesystem paths are rejected
- Deep traversal payloads are caught

Acceptance: All 10 tests pass. Current state documented:
Phase 1 (current): paths pass through without --root; validated with --root
Phase 2 (future): --root mode to be wired to MCP server entry point

References: Plan line 891 (TH-02), INV-10 (no file-path params in HTTP mode)

Closes: pdftract-4h06h
2026-05-25 13:03:45 -04:00

78 lines
3.9 KiB
Markdown

# Verification Note: pdftract-4h06h - TH-02 Path Traversal Test
## Bead
**ID:** pdftract-4h06h
**Title:** TH-02 test: MCP path traversal (10 payloads) rejected with PATH_OUTSIDE_ROOT (or no path param accepted)
**Status:** PASS
## Summary
Implemented comprehensive path-traversal security tests at `crates/pdftract-cli/tests/TH-02-path-traversal.rs`. The test suite documents the 10 canonical path-traversal payloads from the threat model (plan line 891) and verifies that the `resolve_path` function properly rejects them when `--root` mode is enabled.
## What Was Done
1. Created `crates/pdftract-cli/tests/TH-02-path-traversal.rs` with 10 test functions
2. Documented all 10 path-traversal payloads from the threat model:
- Basic traversal (`../../etc/passwd`)
- Deeper traversal (`../../../etc/passwd`)
- Very deep traversal (`../../../../etc/passwd`)
- Absolute paths (`/etc/passwd`)
- Traversal with valid prefix (`./valid/../../../etc/passwd`)
- URL-encoded traversal (`valid/..%2F..%2Fetc%2Fpasswd`)
- Windows separators on Linux (`valid/..\..\..\etc\passwd`)
- Long traversal with valid prefix (`valid/../../../../etc/passwd`)
- Special filesystem (`/proc/self/environ`)
- Windows reserved name (`con`)
3. Test coverage:
- `test_root_mode_rejects_all_traversal_payloads`: Verifies all 10 payloads are rejected when --root is set
- `test_root_mode_accepts_valid_paths`: Verifies valid paths within root are accepted
- `test_without_root_paths_pass_through`: Documents current behavior (paths pass through without --root)
- `test_https_urls_bypass_root_check`: Verifies HTTPS URLs bypass validation per INV-10
- `test_symlink_escape_rejected`: Verifies symlinks escaping root are rejected
- `test_url_encoded_traversal_rejected`: Verifies URL-encoded traversal is caught
- `test_windows_reserved_name_handling`: Handles Windows reserved names safely
- `test_special_filesystem_paths_rejected`: Rejects /proc, /dev paths
- `test_nested_traversal_with_valid_prefix`: Catches traversal after legitimate-looking prefix
- `test_deep_traversal_rejected`: Verifies various depths of ../ are caught
## Current State Documented
Per the bead description, the test documents the current security posture:
- **Phase 1 (current):** MCP tools accept `path` parameters. Without `--root`, paths pass through as-is (trust-the-caller mode for local stdio). With `--root`, paths are canonicalized and validated.
- **Phase 2 (future):** When `--root DIR` is introduced to `pdftract mcp`, all paths will be validated against the root boundary.
The `resolve_path` function in `crates/pdftract-cli/src/mcp/root.rs` already implements the security boundary (canonicalization + boundary check). The --root mode is not yet wired to the MCP server entry point, which is a known gap documented in the test comments.
## Acceptance Criteria
-`tests/security/TH-02-path-traversal.rs` exists (created at `crates/pdftract-cli/tests/TH-02-path-traversal.rs`)
- ✅ Phase 1 tests pass: All 10 traversal payloads are rejected when --root is set
- ✅ The 10 traversal payloads are documented in the test file
- ✅ INV-10 cited as the structural mitigation source (referenced in test documentation)
## Test Results
```
cargo nextest run --test TH-02-path-traversal
Summary [ 0.009s] 10 tests run: 10 passed, 0 skipped
```
All tests passed:
- test_root_mode_rejects_all_traversal_payloads
- test_root_mode_accepts_valid_paths
- test_without_root_paths_pass_through
- test_https_urls_bypass_root_check
- test_symlink_escape_rejected
- test_url_encoded_traversal_rejected
- test_windows_reserved_name_handling
- test_special_filesystem_paths_rejected
- test_nested_traversal_with_valid_prefix
- test_deep_traversal_rejected
## References
- Plan section: TH-02 entry (line 891)
- INV-10: `pdftract mcp` in HTTP mode MUST NOT accept file-path parameters
- `crates/pdftract-cli/src/mcp/root.rs`: Path resolution and escape checking implementation