Add verification note documenting that all acceptance criteria for the --audit-log flag and audit logging infrastructure are already implemented in the codebase. Acceptance criteria verified: - --audit-log FILE flag on serve, mcp, and inspect subcommands - Per-request NDJSON line with all documented fields - Stdio MCP omits client_ip field - Log-policy enforcement (compile-time CI gate + runtime redaction) - TH-08 test for log policy verification - Rotation policy documented in --help - Fingerprint logged instead of path/URL - AuditLogWriter is crash-safe All audit module tests pass (6/6). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4.9 KiB
pdftract-4em4l: Audit Logging Implementation Verification
Summary
The --audit-log FILE flag and audit logging infrastructure was already fully implemented in the codebase. This note documents the verification of the acceptance criteria.
Acceptance Criteria Status
✅ --audit-log FILE flag implemented
- serve:
crates/pdftract-cli/src/main.rs:322- AuditLogWriter passed to serve::run - mcp:
crates/pdftract-cli/src/main.rs:385- AuditLogWriter passed to mcp::run_stdio/run_http - inspect:
crates/pdftract-cli/src/inspect/args.rs:44-62- audit_log field on InspectArgs
✅ Per-request NDJSON line with all documented fields
AuditRecord schema (crates/pdftract-core/src/audit.rs:33-52):
ts: ISO-8601 RFC3339 UTC timestampclient_ip: Option - HTTP peer or stdio (absent)tool: String - Tool name (extract, classify, mcp.extract, etc.)fingerprint: Option - PDF structural fingerprint (pdftract-v1:hex)duration_ms: u64 - Request duration in millisecondsstatus: u16 - HTTP-style status code (200 ok, 4xx client error, 5xx server error)diagnostics: Vec - Diagnostic codes (XREF_REPAIRED, STREAM_BOMB, etc.)
✅ Stdio MCP requests OMIT the client_ip field
Implementation: crates/pdftract-cli/src/mcp/stdio.rs:364,476
None, // No client_ip for stdio mode
✅ Log-policy enforcement (compile-time)
CI Gate: .ci/scripts/check-log-policy.sh
- Scans for credential patterns in log/println/eprintln calls
- Checks for password, token, secret, api_key patterns
- Checks for content leaks (body, content, text, data)
- Runs in CI to prevent violations from being merged
✅ Log-policy enforcement (runtime)
Runtime redaction mechanisms:
- Password redaction:
crates/pdftract-core/src/parser/stream.rs:3167- Debug/Serialize redacts password - Header redaction:
crates/pdftract-cli/src/mcp/http.rs:641-657-redact_headers_for_log()redacts Authorization/Cookie/Proxy-Authorization - Panic hook redaction:
crates/pdftract-cli/src/panic_hook.rs:58-79- Redacts SecretString from backtraces - Encryption debug redaction:
crates/pdftract-core/src/encryption/aes_256.rs:423-427- Debug redacts encryption keys
✅ TH-08 test for log policy
Test: tests/security/TH-08-log-audit.rs
- Runs extraction with
RUST_LOG=trace(maximum verbosity) - Captures stderr (log output)
- Verifies no sensitive content appears in logs
- Tests for password, bearer token, PDF bytes, and sensitive headers
✅ Rotation policy documented in --help output
Documentation: crates/pdftract-cli/src/main.rs:306-320
## Audit log rotation
pdftract does NOT rotate the audit log. Operators MUST configure logrotate(8)
to manage log file size and retention. A typical logrotate configuration:
✅ Fingerprint logged, NOT path/URL
Implementation:
crates/pdftract-core/src/audit.rs:44-fingerprint: Option<String>stores structural fingerprintcrates/pdftract-cli/src/serve.rs:582- Extracts fingerprint from resultcrates/pdftract-cli/src/serve.rs:604- Logs fingerprint instead of path
✅ AuditLogWriter is crash-safe
Implementation: crates/pdftract-core/src/audit.rs:135-143
pub fn write_record(&self, record: &AuditRecord) -> Result<()> {
let json = serde_json::to_string(record)?;
let mut writer = self.writer.lock()?;
writeln!(writer, "{}", json)?;
writer.flush()?; // Flush after each line for crash safety
Ok(())
}
Middleware Integration
HTTP Serve
File: crates/pdftract-cli/src/middleware/audit.rs
- Extracts client IP from peer address or X-Forwarded-For
- Stores RequestMetadata in request extensions
- Handlers write audit logs after extraction completes
MCP HTTP
File: crates/pdftract-cli/src/mcp/http.rs:254-291
- Writes audit log after handling POST requests
- Includes client_ip, duration_ms, status, diagnostics
MCP Stdio
File: crates/pdftract-cli/src/mcp/stdio.rs:361-370
- Writes audit log for tools/call requests
- Omits client_ip (stdio mode has no client)
Inspect
File: crates/pdftract-cli/src/inspect/inspect.rs:79-94
- Creates AuditLogWriter from args.audit_log
- Stores in InspectorState with AuditState wrapper
NEVER-Log Policy
The following are NEVER logged at any level:
- Password values (redacted via SecretString)
- Bearer tokens (redacted via header redaction)
- PDF bytes (not logged; only fingerprint)
- Extracted text content (not logged; only metadata)
- Cookie/Authorization/Proxy-Authorization headers (redacted)
Test Results
Audit module tests: All 6 tests passed
- test_audit_record_new
- test_audit_record_with_client_ip
- test_audit_record_with_diagnostics
- test_audit_record_add_diagnostic
- test_audit_record_serialize
- test_audit_log_writer_memory
Conclusion
All acceptance criteria for pdftract-4em4l are satisfied. The audit logging infrastructure is fully implemented and tested.