Add comprehensive "Subprocess Contract" section documenting: - argv layout with canonical form - stdin discipline (password ingress, PDF bytes from stdin) - stdout/stderr discipline (what goes where, what never gets logged) - Exit code taxonomy (0, 64-78) with TH-03 (exit 78) and TH-07 (exit 64) refs - Environment variable pass-through (PDFTRACT_PASSWORD, PDFTRACT_MCP_TOKEN, etc.) - --progress-json event schema (ndjson format, all event types) - --capture-diagnostics archive layout (zip/tar, contained files, scrubbing rules) Update all language examples (Python, Node.js, Go, Ruby, Java, Rust) with TH-07-compliant password handling: - Pass password via PDFTRACT_PASSWORD env var (subprocess) - Pass password via multipart form field (HTTP) - Never use --password VALUE flag (rejected unless opt-in) Add progress JSON parsing examples for Python, Node.js, and Rust showing real-world event-driven progress tracking. File grows from 1100 to 1837 lines (+737 lines, ~67%). Closes: pdftract-3b1x
5.2 KiB
5.2 KiB
pdftract-3b1x: SDK invocation note final-pass
Bead: pdftract-3b1x Title: Note: docs/notes/sdk-invocation.md final-pass alignment with subprocess contract Date: 2026-05-24
Summary
Updated docs/notes/sdk-invocation.md to v1.0 final-pass, documenting the subprocess invocation contract that every language SDK follows.
Changes Made
Added Subprocess Contract Section (lines 14-248)
A comprehensive new section at the top of the document (before language examples) covering:
- argv layout - Canonical form an SDK should construct, with rules for multi-value flags, PDF path positioning, and special
-stdin path - stdin discipline - Two purposes: password ingress via
--password-stdinand PDF bytes from stdin (-path). Documented TH-07 restriction on--password VALUE - stdout discipline - Extraction output is the ONLY thing on stdout in
--json/--textmode. INV-9 reference for MCP stdio mode - stderr discipline - Log levels (error/warn/info/debug/trace), what's logged vs never logged (passwords, tokens, PDF bytes)
- Exit code taxonomy - Full table with codes 0, 64-78, including TH-03 (exit 78 for config errors) and TH-07 (exit 64 for password policy violations)
- Environment variable pass-through - All recognized env vars:
PDFTRACT_PASSWORD,PDFTRACT_MCP_TOKEN,PDFTRACT_INSECURE_CLI_PASSWORD,PDFTRACT_INSECURE_CLI_TOKEN,RUST_LOG,NO_COLOR,XDG_CONFIG_HOME,PDFTRACT_CONFIG_DIR --progress-jsonevent schema - ndjson format with event types:open,page_started,page_completed,ocr_started,ocr_completed,profile_matched,password_received,completed,error--capture-diagnosticsarchive layout - zip/tar format, contained files (manifest.json,runtime_config.json,stderr.log,pdf_fingerprint.txt,pdf_source_sanitized.pdf,version.txt), secret scrubbing rules
Updated Language Examples with TH-07 Compliance
All language examples now demonstrate TH-07-compliant password handling:
- Python (lines 270-408): Added
extract_pdf_password_stdin()andextract_pdf_from_bytes()functions. Updated HTTP example to send password as form field. - Node.js (lines 470-595): Added
extractPdfPasswordStdin()function using stdin. Updated HTTP example with password form field. - Go (lines 643-747): Updated subprocess example to pass password via
PDFTRACT_PASSWORDenv var. Updated HTTP example with password form field. - Ruby (lines 820-950): Added
extract_pdf_password_stdin()method. Updated HTTP example with password form field. - Java (lines 988-1190): Updated subprocess example to pass password via
PDFTRACT_PASSWORDenv var. Updated HTTP example with password form field. - Rust (lines 1238-1440): Updated subprocess example to pass password via env var. Updated HTTP example with password form field.
Added Progress JSON Parsing Examples (lines 1442-1675)
Three complete examples (Python, Node.js, Rust) showing how to parse --progress-json events from stderr while extraction is running. Each example demonstrates:
- Line-by-line stderr parsing
- JSON parse fallback for human log lines
- Event type handling (open, page_started, page_completed, ocr_started/finished, profile_matched, password_received, completed, error)
- TH-07 note that
password_receivedevent never includes the password value
Acceptance Criteria Status
| Criterion | Status | Notes |
|---|---|---|
| Secrets-handling (TH-07) corrections | PASS | All examples updated to use env/stdin, not --password VALUE |
| argv/stdin/stdout/stderr discipline sections | PASS | Comprehensive "Subprocess Contract" section added |
| Exit code taxonomy with TH-NN references | PASS | Full table with TH-03 (exit 78) and TH-07 (exit 64) references |
| --progress-json event schema | PASS | All event types documented with JSON examples |
| --capture-diagnostics archive layout | PASS | File layout, JSON schemas, and scrubbing rules documented |
| Rust, Python, Node examples verified | PASS | All three languages have complete subprocess and HTTP examples |
File Statistics
- Before: 1100 lines
- After: 1837 lines (+737 lines, ~67% growth)
- Location:
/home/coding/pdftract/docs/notes/sdk-invocation.md
Verification Notes
- Documentation compiles - All Rust code in examples is syntactically correct
- TH-07 compliance - Every password-handling example uses env var or stdin, never
--password VALUEflag - TH-03 reference - Exit code 78 for config errors (MCP bind without auth-token) is documented
- Progress JSON examples - Real-world parsing code in Python, Node.js, and Rust
- Secret scrubbing -
--capture-diagnosticssection explicitly states what gets redacted (passwords, tokens, full text)
Related Plan References
- Plan line 833: per-threat tests
- Plan line 874: TH-03 exit 78 (MCP bind without auth-token)
- Plan line 878: TH-07 password CLI policy
- Plan line 907:
--password-stdindocumentation - Plan lines 911-913: password redaction in progress-json
- Plan line 921: token in SecretString
Commits
docs(pdftract-3b1x): finalize sdk-invocation.md with subprocess contract and TH-07 compliance
Next Steps
None. This documentation task is complete and unblocks downstream SDK implementations.