docs(pdftract-4em4l): verify audit logging implementation complete
Verification of pdftract-4em4l audit logging requirements: - --audit-log FILE flag on serve, mcp, inspect subcommands ✅ - Per-request NDJSON with ts, client_ip, tool, fingerprint, duration_ms, status, diagnostics ✅ - Stdio MCP omits client_ip field (None, not empty string) ✅ - NEVER-log policy enforcement via log_policy.rs ✅ - Rotation policy documented in --help output ✅ - Fingerprint logged, not path/URL ✅ - AuditLogWriter crash-safe (BufWriter + flush) ✅ - TH-08 test at tests/security/TH-08-log-audit.rs ✅ All infrastructure was already in place. No code changes required. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
5ecfc97668
commit
8d06ad24ae
1 changed files with 84 additions and 95 deletions
|
|
@ -2,121 +2,110 @@
|
|||
|
||||
## Summary
|
||||
|
||||
The `--audit-log FILE` flag and audit logging infrastructure was already fully implemented in the codebase. This note documents the verification of the acceptance criteria.
|
||||
The audit logging infrastructure is **fully implemented** across all public-listener subcommands. This verification confirms that all requirements from the task description are met.
|
||||
|
||||
## Acceptance Criteria Status
|
||||
## Implementation Status
|
||||
|
||||
### ✅ --audit-log FILE flag implemented
|
||||
- **serve**: `crates/pdftract-cli/src/main.rs:322` - AuditLogWriter passed to serve::run
|
||||
- **mcp**: `crates/pdftract-cli/src/main.rs:385` - AuditLogWriter passed to mcp::run_stdio/run_http
|
||||
- **inspect**: `crates/pdftract-cli/src/inspect/args.rs:44-62` - audit_log field on InspectArgs
|
||||
### 1. Core Audit Infrastructure ✅
|
||||
|
||||
### ✅ Per-request NDJSON line with all documented fields
|
||||
**AuditRecord schema** (`crates/pdftract-core/src/audit.rs:33-52`):
|
||||
- `ts`: ISO-8601 RFC3339 UTC timestamp
|
||||
- `client_ip`: Option<String> - HTTP peer or stdio (absent)
|
||||
- `tool`: String - Tool name (extract, classify, mcp.extract, etc.)
|
||||
- `fingerprint`: Option<String> - PDF structural fingerprint (pdftract-v1:hex)
|
||||
- `duration_ms`: u64 - Request duration in milliseconds
|
||||
- `status`: u16 - HTTP-style status code (200 ok, 4xx client error, 5xx server error)
|
||||
- `diagnostics`: Vec<String> - Diagnostic codes (XREF_REPAIRED, STREAM_BOMB, etc.)
|
||||
**File: `crates/pdftract-core/src/audit.rs`**
|
||||
- `AuditLogWriter`: Thread-safe NDJSON writer with crash-safe writes
|
||||
- Single atomic write per line (BufWriter with flush)
|
||||
- Supports stdout (`-`), stderr (`/dev/stderr`), and file paths
|
||||
- `AuditRecord`: Schema with all required fields:
|
||||
- `ts`: ISO-8601 RFC3339 UTC timestamp
|
||||
- `client_ip`: Optional string (absent for stdio MCP)
|
||||
- `tool`: Tool name
|
||||
- `fingerprint`: Structural fingerprint (pdftract-v1:hex form)
|
||||
- `duration_ms`: Request duration in milliseconds
|
||||
- `status`: HTTP-style status code (200, 4xx, 5xx)
|
||||
- `diagnostics`: Vec of diagnostic code strings
|
||||
|
||||
### ✅ Stdio MCP requests OMIT the client_ip field
|
||||
**Implementation**: `crates/pdftract-cli/src/mcp/stdio.rs:364,476`
|
||||
```rust
|
||||
None, // No client_ip for stdio mode
|
||||
```
|
||||
### 2. NEVER-Log Policy Enforcement ✅
|
||||
|
||||
### ✅ Log-policy enforcement (compile-time)
|
||||
**CI Gate**: `.ci/scripts/check-log-policy.sh`
|
||||
- Scans for credential patterns in log/println/eprintln calls
|
||||
- Checks for password, token, secret, api_key patterns
|
||||
- Checks for content leaks (body, content, text, data)
|
||||
- Runs in CI to prevent violations from being merged
|
||||
**File: `crates/pdftract-core/src/log_policy.rs`**
|
||||
- `redact_log_line()`: Runtime redaction for free-form logs
|
||||
- `redact_audit_log_line()`: Specialized redaction for audit logs (preserves JSON structure)
|
||||
- `is_sensitive_header()`: Checks for sensitive headers
|
||||
- `redact_header_value()`: Redacts header values
|
||||
- Patterns detected:
|
||||
- Passwords: `password:`, `pwd:`, `pass:`, `passwd:`
|
||||
- Tokens: `token:`, `bearer`, `api_key`, `secret`
|
||||
- Headers: Authorization, Cookie, Proxy-Authorization
|
||||
- Base64-like strings (JWT tokens, API keys)
|
||||
- Long words (>100 chars) truncated in free-form logs
|
||||
|
||||
### ✅ Log-policy enforcement (runtime)
|
||||
**Runtime redaction mechanisms**:
|
||||
1. **Password redaction**: `crates/pdftract-core/src/parser/stream.rs:3167` - Debug/Serialize redacts password
|
||||
2. **Header redaction**: `crates/pdftract-cli/src/mcp/http.rs:641-657` - `redact_headers_for_log()` redacts Authorization/Cookie/Proxy-Authorization
|
||||
3. **Panic hook redaction**: `crates/pdftract-cli/src/panic_hook.rs:58-79` - Redacts SecretString from backtraces
|
||||
4. **Encryption debug redaction**: `crates/pdftract-core/src/encryption/aes_256.rs:423-427` - Debug redacts encryption keys
|
||||
### 3. Audit Middleware ✅
|
||||
|
||||
### ✅ TH-08 test for log policy
|
||||
**Test**: `tests/security/TH-08-log-audit.rs`
|
||||
- Runs extraction with `RUST_LOG=trace` (maximum verbosity)
|
||||
- Captures stderr (log output)
|
||||
- Verifies no sensitive content appears in logs
|
||||
- Tests for password, bearer token, PDF bytes, and sensitive headers
|
||||
**File: `crates/pdftract-cli/src/middleware/audit.rs`**
|
||||
- `audit_middleware`: Tower middleware for axum
|
||||
- Extracts client IP from peer address (default) or X-Forwarded-For (with --trust-forwarded-for)
|
||||
- Records request start time
|
||||
- Extracts tool name from path
|
||||
- Stores `RequestMetadata` in extensions for handlers
|
||||
- `AuditState`: Holds optional AuditLogWriter and trust_forwarded_for flag
|
||||
- `RequestMetadata`: Start time, client_ip, tool
|
||||
|
||||
### ✅ Rotation policy documented in --help output
|
||||
**Documentation**: `crates/pdftract-cli/src/main.rs:306-320`
|
||||
```text
|
||||
## Audit log rotation
|
||||
### 4. CLI Flags ✅
|
||||
|
||||
pdftract does NOT rotate the audit log. Operators MUST configure logrotate(8)
|
||||
to manage log file size and retention. A typical logrotate configuration:
|
||||
```
|
||||
**File: `crates/pdftract-cli/src/main.rs`**
|
||||
- Serve subcommand (lines 304-309): `--audit-log FILE` with rotation documentation
|
||||
- MCP subcommand (lines 354-359): Same flag and documentation
|
||||
- Trust-forwarded-for flag (lines 311-313): `--trust-forwarded-for` flag
|
||||
|
||||
### ✅ Fingerprint logged, NOT path/URL
|
||||
**Implementation**:
|
||||
- `crates/pdftract-core/src/audit.rs:44` - `fingerprint: Option<String>` stores structural fingerprint
|
||||
- `crates/pdftract-cli/src/serve.rs:582` - Extracts fingerprint from result
|
||||
- `crates/pdftract-cli/src/serve.rs:604` - Logs fingerprint instead of path
|
||||
**File: `crates/pdftract-cli/src/inspect/args.rs`**
|
||||
- Inspect subcommand (lines 44-49): Same audit-log flag with rotation documentation
|
||||
|
||||
### ✅ AuditLogWriter is crash-safe
|
||||
**Implementation**: `crates/pdftract-core/src/audit.rs:135-143`
|
||||
```rust
|
||||
pub fn write_record(&self, record: &AuditRecord) -> Result<()> {
|
||||
let json = serde_json::to_string(record)?;
|
||||
let mut writer = self.writer.lock()?;
|
||||
writeln!(writer, "{}", json)?;
|
||||
writer.flush()?; // Flush after each line for crash safety
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
### 5. Serve Mode Integration ✅
|
||||
|
||||
## Middleware Integration
|
||||
**File: `crates/pdftract-cli/src/serve.rs`**
|
||||
- Lines 70-82: Imports AuditLogWriter and AuditState
|
||||
- Lines 109-111: ServeState includes audit field
|
||||
- Lines 117-125: ServeState::new creates AuditState with trust_forwarded_for support
|
||||
- Lines 386-394: Creates AuditLogWriter from --audit-log flag
|
||||
- Lines 419-421: Applies audit_middleware layer
|
||||
- Lines 599-609: Writes audit log after successful extraction
|
||||
- Lines 678-688: Writes audit log for text extraction
|
||||
|
||||
### HTTP Serve
|
||||
**File**: `crates/pdftract-cli/src/middleware/audit.rs`
|
||||
- Extracts client IP from peer address or X-Forwarded-For
|
||||
- Stores RequestMetadata in request extensions
|
||||
- Handlers write audit logs after extraction completes
|
||||
### 6. MCP Stdio Mode Integration ✅
|
||||
|
||||
### MCP HTTP
|
||||
**File**: `crates/pdftract-cli/src/mcp/http.rs:254-291`
|
||||
- Writes audit log after handling POST requests
|
||||
- Includes client_ip, duration_ms, status, diagnostics
|
||||
**File: `crates/pdftract-cli/src/mcp/stdio.rs`**
|
||||
- Lines 349-365: Writes audit log with `client_ip: None` (stdio mode has no peer address)
|
||||
|
||||
### MCP Stdio
|
||||
**File**: `crates/pdftract-cli/src/mcp/stdio.rs:361-370`
|
||||
- Writes audit log for tools/call requests
|
||||
- Omits client_ip (stdio mode has no client)
|
||||
### 7. MCP HTTP Mode Integration ✅
|
||||
|
||||
### Inspect
|
||||
**File**: `crates/pdftract-cli/src/inspect/inspect.rs:79-94`
|
||||
- Creates AuditLogWriter from args.audit_log
|
||||
- Stores in InspectorState with AuditState wrapper
|
||||
**File: `crates/pdftract-cli/src/mcp/http.rs`**
|
||||
- Lines 255-292: Writes audit log with client_ip (HTTP mode has peer address)
|
||||
|
||||
## NEVER-Log Policy
|
||||
### 8. Test Coverage ✅
|
||||
|
||||
The following are NEVER logged at any level:
|
||||
- Password values (redacted via SecretString)
|
||||
- Bearer tokens (redacted via header redaction)
|
||||
- PDF bytes (not logged; only fingerprint)
|
||||
- Extracted text content (not logged; only metadata)
|
||||
- Cookie/Authorization/Proxy-Authorization headers (redacted)
|
||||
**File: `tests/security/TH-08-log-audit.rs`**
|
||||
- Tests that PDF content is never logged at trace level
|
||||
- Tests that password values are never logged
|
||||
- Tests that bearer tokens are never logged
|
||||
- Tests that PDF bytes are never logged
|
||||
- Tests that sensitive headers are never logged
|
||||
|
||||
## Test Results
|
||||
## Acceptance Criteria Verification
|
||||
|
||||
**Audit module tests**: All 6 tests passed
|
||||
- test_audit_record_new
|
||||
- test_audit_record_with_client_ip
|
||||
- test_audit_record_with_diagnostics
|
||||
- test_audit_record_add_diagnostic
|
||||
- test_audit_record_serialize
|
||||
- test_audit_log_writer_memory
|
||||
| Criterion | Status | Evidence |
|
||||
|-----------|--------|----------|
|
||||
| `--audit-log FILE` flag on serve, mcp, inspect | ✅ | main.rs:304-309, 354-359; inspect/args.rs:44-49 |
|
||||
| Per-request NDJSON line with all fields | ✅ | audit.rs:40-58 (AuditRecord) |
|
||||
| Stdio MCP omits client_ip field | ✅ | stdio.rs:357 (None for client_ip) |
|
||||
| NEVER-log policy enforcement | ✅ | log_policy.rs:48-161 (redaction) |
|
||||
| Rotation policy in --help | ✅ | main.rs:306-308, 356-358; inspect/args.rs:45-47 |
|
||||
| Fingerprint logged, not path/URL | ✅ | audit.rs:50-51 (fingerprint field) |
|
||||
| AuditLogWriter crash-safe | ✅ | audit.rs:102-154 (BufWriter + flush) |
|
||||
| Trust-forwarded-for flag | ✅ | main.rs:311-313; middleware/audit.rs:99-121 |
|
||||
| TH-08 test exists | ✅ | tests/security/TH-08-log-audit.rs |
|
||||
|
||||
## Conclusion
|
||||
|
||||
All acceptance criteria for pdftract-4em4l are satisfied. The audit logging infrastructure is fully implemented and tested.
|
||||
All acceptance criteria are met. The audit logging infrastructure is complete and operational across:
|
||||
- `pdftract serve` (HTTP mode)
|
||||
- `pdftract mcp --stdio` (stdio mode)
|
||||
- `pdftract mcp --bind` (HTTP+SSE mode)
|
||||
- `pdftract inspect` (inspector mode)
|
||||
|
||||
The NEVER-log policy is enforced at runtime via `redact_audit_log_line()`, and the TH-08 test verifies compliance.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue