From 255d9c593beff256c9e8e37fbc7cd7a25593d55d Mon Sep 17 00:00:00 2001 From: jedarden Date: Thu, 28 May 2026 14:36:11 -0400 Subject: [PATCH] docs(pdftract-4em4l): audit logging implementation verification Add verification note documenting that all acceptance criteria for the --audit-log flag and audit logging infrastructure are already implemented in the codebase. Acceptance criteria verified: - --audit-log FILE flag on serve, mcp, and inspect subcommands - Per-request NDJSON line with all documented fields - Stdio MCP omits client_ip field - Log-policy enforcement (compile-time CI gate + runtime redaction) - TH-08 test for log policy verification - Rotation policy documented in --help - Fingerprint logged instead of path/URL - AuditLogWriter is crash-safe All audit module tests pass (6/6). Co-Authored-By: Claude Opus 4.7 --- notes/pdftract-4em4l.md | 122 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 notes/pdftract-4em4l.md diff --git a/notes/pdftract-4em4l.md b/notes/pdftract-4em4l.md new file mode 100644 index 0000000..608d063 --- /dev/null +++ b/notes/pdftract-4em4l.md @@ -0,0 +1,122 @@ +# pdftract-4em4l: Audit Logging Implementation Verification + +## Summary + +The `--audit-log FILE` flag and audit logging infrastructure was already fully implemented in the codebase. This note documents the verification of the acceptance criteria. + +## Acceptance Criteria Status + +### ✅ --audit-log FILE flag implemented +- **serve**: `crates/pdftract-cli/src/main.rs:322` - AuditLogWriter passed to serve::run +- **mcp**: `crates/pdftract-cli/src/main.rs:385` - AuditLogWriter passed to mcp::run_stdio/run_http +- **inspect**: `crates/pdftract-cli/src/inspect/args.rs:44-62` - audit_log field on InspectArgs + +### ✅ Per-request NDJSON line with all documented fields +**AuditRecord schema** (`crates/pdftract-core/src/audit.rs:33-52`): +- `ts`: ISO-8601 RFC3339 UTC timestamp +- `client_ip`: Option - HTTP peer or stdio (absent) +- `tool`: String - Tool name (extract, classify, mcp.extract, etc.) +- `fingerprint`: Option - PDF structural fingerprint (pdftract-v1:hex) +- `duration_ms`: u64 - Request duration in milliseconds +- `status`: u16 - HTTP-style status code (200 ok, 4xx client error, 5xx server error) +- `diagnostics`: Vec - Diagnostic codes (XREF_REPAIRED, STREAM_BOMB, etc.) + +### ✅ Stdio MCP requests OMIT the client_ip field +**Implementation**: `crates/pdftract-cli/src/mcp/stdio.rs:364,476` +```rust +None, // No client_ip for stdio mode +``` + +### ✅ Log-policy enforcement (compile-time) +**CI Gate**: `.ci/scripts/check-log-policy.sh` +- Scans for credential patterns in log/println/eprintln calls +- Checks for password, token, secret, api_key patterns +- Checks for content leaks (body, content, text, data) +- Runs in CI to prevent violations from being merged + +### ✅ Log-policy enforcement (runtime) +**Runtime redaction mechanisms**: +1. **Password redaction**: `crates/pdftract-core/src/parser/stream.rs:3167` - Debug/Serialize redacts password +2. **Header redaction**: `crates/pdftract-cli/src/mcp/http.rs:641-657` - `redact_headers_for_log()` redacts Authorization/Cookie/Proxy-Authorization +3. **Panic hook redaction**: `crates/pdftract-cli/src/panic_hook.rs:58-79` - Redacts SecretString from backtraces +4. **Encryption debug redaction**: `crates/pdftract-core/src/encryption/aes_256.rs:423-427` - Debug redacts encryption keys + +### ✅ TH-08 test for log policy +**Test**: `tests/security/TH-08-log-audit.rs` +- Runs extraction with `RUST_LOG=trace` (maximum verbosity) +- Captures stderr (log output) +- Verifies no sensitive content appears in logs +- Tests for password, bearer token, PDF bytes, and sensitive headers + +### ✅ Rotation policy documented in --help output +**Documentation**: `crates/pdftract-cli/src/main.rs:306-320` +```text +## Audit log rotation + +pdftract does NOT rotate the audit log. Operators MUST configure logrotate(8) +to manage log file size and retention. A typical logrotate configuration: +``` + +### ✅ Fingerprint logged, NOT path/URL +**Implementation**: +- `crates/pdftract-core/src/audit.rs:44` - `fingerprint: Option` stores structural fingerprint +- `crates/pdftract-cli/src/serve.rs:582` - Extracts fingerprint from result +- `crates/pdftract-cli/src/serve.rs:604` - Logs fingerprint instead of path + +### ✅ AuditLogWriter is crash-safe +**Implementation**: `crates/pdftract-core/src/audit.rs:135-143` +```rust +pub fn write_record(&self, record: &AuditRecord) -> Result<()> { + let json = serde_json::to_string(record)?; + let mut writer = self.writer.lock()?; + writeln!(writer, "{}", json)?; + writer.flush()?; // Flush after each line for crash safety + Ok(()) +} +``` + +## Middleware Integration + +### HTTP Serve +**File**: `crates/pdftract-cli/src/middleware/audit.rs` +- Extracts client IP from peer address or X-Forwarded-For +- Stores RequestMetadata in request extensions +- Handlers write audit logs after extraction completes + +### MCP HTTP +**File**: `crates/pdftract-cli/src/mcp/http.rs:254-291` +- Writes audit log after handling POST requests +- Includes client_ip, duration_ms, status, diagnostics + +### MCP Stdio +**File**: `crates/pdftract-cli/src/mcp/stdio.rs:361-370` +- Writes audit log for tools/call requests +- Omits client_ip (stdio mode has no client) + +### Inspect +**File**: `crates/pdftract-cli/src/inspect/inspect.rs:79-94` +- Creates AuditLogWriter from args.audit_log +- Stores in InspectorState with AuditState wrapper + +## NEVER-Log Policy + +The following are NEVER logged at any level: +- Password values (redacted via SecretString) +- Bearer tokens (redacted via header redaction) +- PDF bytes (not logged; only fingerprint) +- Extracted text content (not logged; only metadata) +- Cookie/Authorization/Proxy-Authorization headers (redacted) + +## Test Results + +**Audit module tests**: All 6 tests passed +- test_audit_record_new +- test_audit_record_with_client_ip +- test_audit_record_with_diagnostics +- test_audit_record_add_diagnostic +- test_audit_record_serialize +- test_audit_log_writer_memory + +## Conclusion + +All acceptance criteria for pdftract-4em4l are satisfied. The audit logging infrastructure is fully implemented and tested.