Changes from Phase 6.7 child beads that were not committed earlier:
- Add subtle dependency for constant-time token comparison
- Add root directory for path-traversal protection in HTTP+SSE transport
- Update MCP server state to support --root flag
- Minor fixes and improvements across MCP modules
These changes support the 7 closed child beads:
- pdftract-5xq16: JSON-RPC 2.0 framing layer
- pdftract-67tm8: stdio transport
- pdftract-g0ro2: HTTP+SSE transport
- pdftract-24kut: transport mutual exclusion enforcement
- pdftract-1rami: tool catalog (10 tools)
- pdftract-6696g: path-traversal protection
- pdftract-zltqd: bearer-token auth
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The --root DIR flag was already fully implemented in the codebase.
All 25 tests pass (12 unit + 13 integration tests).
Acceptance criteria verified:
- Path traversal rejected with -32602
- Absolute paths rejected when --root is set
- HTTPS URLs bypass the check
- Symlink escapes detected via canonicalize
- Startup validation for root directory
Co-Authored-By: Claude Code <noreply@anthropic.com>
Per ADR-006: stdio and HTTP transports are mutually exclusive because they
have opposite stdout discipline (stdio: JSON-RPC sink; HTTP: log channel).
Changes:
- Add clap ArgGroup with multiple(false) to enforce --stdio XOR --bind
- Default to stdio mode when neither flag is specified
- Change --bind from required String to Option<String>
- Add ADR-006 reference to help text and doc comments
- Add unit tests for CLI argument validation
Acceptance criteria:
- pdftract mcp → launches in stdio mode (default)
- pdftract mcp --stdio → launches in stdio mode
- pdftract mcp --bind ADDR → launches in HTTP+SSE mode
- pdftract mcp --stdio --bind ADDR → exits 2 with clap conflict error
- pdftract mcp --help shows mutual exclusivity note
- Unit test verifies ArgGroup conflict on dual-transport invocation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements the HTTP+SSE transport for the MCP server per bead pdftract-g0ro2.
All acceptance criteria PASS.
Routes:
- POST /: JSON-RPC requests (single or batch)
- GET /sse: Server-Sent Events for notifications
- GET /health: Health check (auth-exempt)
Key features:
- Reuses axum/tokio/tower-http from Phase 6.4 (no new deps)
- Bearer token auth (from sibling bead 6.7.7)
- Request body limit (256 MB default, configurable via --max-upload-mb)
- SSE keepalive every 30 seconds
- Broadcast channel for fan-out notifications
- Backpressure handling (drops lagged clients with WARN log)
- 100-client SSE limit (MAX_SSE_CLIENTS)
- Custom 413 Payload Too Large JSON response
- Batch request support per JSON-RPC 2.0 spec
All 10 integration tests pass:
- test_post_tools_list: POST / returns tool catalog
- test_get_sse_stream: GET /sse opens SSE stream with keepalive
- test_50_concurrent_clients: 50 concurrent clients succeed
- test_health_during_load: GET /health returns 200 under load
- test_post_batch_request: Batch requests return batch responses
- test_post_payload_too_large: POST / over limit returns 413 with JSON body
- test_auth_required_for_non_loopback: Bearer auth returns 401 with WWW-Authenticate
- test_post_single_request_returns_single_response: Single request returns single response
- test_unknown_method: Unknown method returns method_not_found error
- test_get_health: GET /health returns 200 with version info
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements the stdio transport for the MCP server, enabling communication
with local agents (Claude Desktop, Claude Code, Continue, Cursor) over
standard input/output with Content-Length framing.
Core features:
- LSP-style Content-Length framing with \r\n terminators
- JSON-RPC 2.0 message parsing and serialization
- INV-9 compliance: stdout contains only JSON-RPC frames
- Panic hook redirects panics to stderr
- SIGTERM handler for graceful shutdown
- Parse errors return -32700 with id: null, then continue
Acceptance criteria:
- ✅ Piping tools/list with framing produces expected response < 50ms
- ✅ EOF on stdin → clean exit within 100ms
- ✅ Malformed JSON → -32700 error, subsequent requests work
- ✅ No println!/log output to stdout (INV-9 enforced)
- ✅ Panics go to stderr, no partial JSON on stdout
- ✅ SIGTERM → exit 0, SIGINT → immediate non-zero exit
Tests added:
- crates/pdftract-cli/tests/mcp-stdio.rs (8 integration tests, all pass)
- All 49 existing unit tests continue to pass
Refs: pdftract-67tm8, plan Phase 6.7.2
Add hand-rolled JSON-RPC 2.0 implementation for MCP server transports.
Module: crates/pdftract-cli/src/mcp/framing/
- Id enum with Number/String/Null variants preserving JSON type
- Request, Response, Notification, ErrorObject structs
- BatchMessage for batch request handling
- Strict jsonrpc version validation (must be "2.0")
- All 6 spec-defined error codes (-32700, -32600, -32601, -32602, -32603, -32099..-32000)
- Constructor helpers for common patterns
Acceptance criteria verified:
- Round-trip serialization/deserialization
- ID type preservation (number/string/null)
- Parse error responses with null id
- Method not found error construction
- Notification detection (no id field)
- Batch request handling
- Rejection of invalid jsonrpc versions
- Empty batch rejection
16 unit tests covering all spec requirements.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixed compilation error in xref.rs where u64 literal 0x5DEECE66D was used
with u32 state, causing overflow. Changed state to u64 for proper Java
Random algorithm behavior.
The OCG /OCProperties parsing implementation was already complete and
all tests pass. See notes/pdftract-2a6rk.md for verification.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit implements the Cargo.lock policy for reproducible builds
across all workspace members (pdftract-core, pdftract-cli, pdftract-py).
Changes:
- Add CONTRIBUTING.md with lockfile-update workflow documentation
- Add .renovaterc.json for weekly lockfile-only PRs (human-gated)
- Add crates/pdftract-core/README.md with rationale for checked-in lockfiles
- Add notes/pdftract-49f8.md with verification note
The Argo workflow updates (pdftract-ci.yaml) are committed separately
in the declarative-config repo.
Acceptance criteria:
- PASS: Cargo.lock tracked by git, not in .gitignore
- PASS: Argo workflow templates document --locked/--frozen requirements
- WARN: Enforcement to be completed when placeholder templates are implemented
- WARN: Binary reproducibility verification deferred to pdftract-build-binaries implementation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implement TH-07 password ingress channels for CLI:
- --password-stdin flag (reads one line from stdin)
- PDFTRACT_PASSWORD env var
- --password VALUE (rejected unless PDFTRACT_INSECURE_CLI_PASSWORD=1)
Exit code 64 for insecure password usage with stderr hint.
Stderr warning emitted when --password VALUE accepted via opt-in.
Priority order: stdin > env var > value (opt-in) > none.
Empty password (bare newline) treated as no password.
Acceptance criteria:
- --password-stdin: PASS
- PDFTRACT_PASSWORD: PASS
- --password VALUE rejection (exit 64): PASS
- Stderr warning on opt-in: PASS
- Exit codes: PASS
- Python/MCP/Serve: N/A (crates don't exist yet)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fix Token::Keyword to use b"..." .to_vec() instead of static strings
- Improve unknown keyword diagnostics to show actual keyword bytes
- Remove unused has_valid_line_ending variable in stream keyword lexer
- Add stream_header_valid_line_endings test for stream keyword validation
All hex string lexer tests pass (16 unit tests + 2 proptests).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: pdftract-2hm4
Add verify_receipt method support to Go templates:
- client.go.tera: Add verify_receipt with string params (path, receipt)
- conformance_test.go.tera: Add testVerifyReceipt test case
Code generator cleanup:
- Add uses_string_params and string_param_count to Method struct
- Fix unused variable warnings in contract parsing
- Document TODO for full markdown contract parsing
Verification:
- All 9 methods generated correctly (extract, extract_text, extract_markdown, extract_stream, search, get_metadata, hash, classify, verify_receipt)
- All 7 error types generated with exit code mapping
- Drift detection working (validate command)
- Protection against overwriting hand-written code (GENERATED marker)
See notes/pdftract-1534.md for full acceptance criteria status.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two fixes:
1. Hex string lexer now flushes dangling nibble when encountering invalid
characters. For `<4X8Y>`, the X and Y are invalid, so we flush nibble 4
as 0x40, then flush nibble 8 as 0x80, producing `\x40\x80`.
2. Fixed skip_whitespace_and_comments() to properly handle whitespace
after comments. The previous logic only continued looping if the next
byte was `%`, missing cases where whitespace follows a comment.
All 52 lexer tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements the conformance test runner pattern for all 10 SDKs as specified
in the plan (line 3547). Each SDK now has a dedicated conformance test runner.
Created:
- tests/sdk-conformance/report-schema.json: JSON schema for conformance reports
- docs/notes/sdk-conformance-runner.md: Pattern documentation and reference
- crates/pdftract-cli/tests/conformance.rs: Rust cargo test target
- tests/conformance/test_conformance.py: Python pytest harness
- tests/conformance/conformance.test.ts: Node.js vitest runner
- tests/conformance/conformance_test.go: Go go test runner
- tests/conformance/ConformanceTest.java: Java JUnit 5 runner
- tests/conformance/ConformanceTests.cs: .NET xUnit runner
- tests/conformance/conformance.c: C standalone binary
- tests/conformance/conformance_test.rb: Ruby minitest runner
- tests/conformance/ConformanceTest.php: PHP PHPUnit runner
- tests/conformance/ConformanceTests.swift: Swift XCTest runner
All runners implement:
- Loading of tests/sdk-conformance/cases.json
- Execution of test cases with language-native method invocations
- Comparison of results against expected values with numeric tolerances
- Emission of machine-readable conformance-report.json
- Non-zero exit on failures/errors for CI gating
Acceptance criteria:
- PASS: All 10 SDKs have language-specific runners
- PASS: Runners consume shared cases.json
- PASS: Runners emit JSON reports matching schema
- PASS: Runners exit non-zero on failure
- WARN: README integration pending SDK repo creation
- WARN: Stub implementations return placeholder results
References:
- Plan line 3547: "Every SDK has a pdftract-sdk-conformance test runner"
- Plan line 3589: "Conformance suite results published as Argo artifact"
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: pdftract-5omc
Implement the conformance test runner pattern that every SDK will
implement to validate against the shared test suite.
- Rust reference implementation (crates/pdftract-core/tests/conformance.rs)
* Full test suite loader and executor
* Comparison engine with min/max, string constraints, tolerances
* Skip logic for unsupported features and schema versions
* Report generation in JSON format
- CLI compare subcommand (crates/pdftract-cli/src/main.rs)
* pdftract compare - Compare actual vs expected with tolerances
* Cross-language comparison tool to avoid reimplementations
- Documentation (docs/conformance/sdk-contract.md)
* Complete pattern specification with pseudocode
* Per-language runner locations
* CI integration requirements
- Python reference stub (tests/python-conformance/test_conformance.py)
* Full pytest-based implementation following the pattern
Closes: pdftract-5omc