pdftract/notes/pdftract-1i366.md
jedarden 7ffb1a729f fix(pdftract-63ka2): AES-128 test buffer allocation for PKCS#7 padding
The encrypt_padded_mut API requires the buffer to be large enough to
hold the padded ciphertext. The tests were using plaintext.to_vec() which
only allocated plaintext.len() bytes, insufficient for padding.

Changed pattern:
- Before: plaintext.to_vec() (insufficient space)
- After: vec![0u8; plaintext.len() + 16] with copy_from_slice

Also fixed incorrect usage: encrypt_padded_mut returns Result<(), Error>,
not a length. Use data_copy.len() directly for ciphertext length.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 01:30:33 -04:00

89 lines
3.9 KiB
Markdown

# pdftract-1i366: Security Constraints Documentation
## Summary
Task 6.4.5: Security constraints documented + sample reverse-proxy configs (nginx + Traefik)
## Work Completed
### Acceptance Criteria Status
**PASS** - Startup banner printed clearly on serve start
- Location: `crates/pdftract-cli/src/serve.rs:463`
- Banner text: `"*** NO BUILT-IN AUTH *** — Deploy behind a reverse proxy for production."`
- Also prints max upload size and max decompression size
**PASS** - Attempted file-path parameter returns 404
- By design: no endpoints accept file paths from server filesystem
- All PDFs arrive via `multipart/form-data` upload only
- Routes: `POST /extract`, `POST /extract/text`, `POST /extract/stream`, `GET /health`
- File-path parameters (e.g., `GET /extract?path=/etc/passwd`) would return 404 as no such route exists
**PASS** - Decompression limit enforced
- Test fixture: `crates/pdftract-core/tests/TH-01-stream-bomb.rs`
- `ExtractionOptions.max_decompress_bytes` enforces limit
- Server default: `--max-decompress-gb` CLI flag (1 GB default)
- Per-request override: `max_decompress_gb` form field
- Hard cap validation: 4096 GB maximum to prevent integer overflow
**PASS** - Sample configs committed and validated
- `docs/operations/serve-nginx-example.conf` - nginx config with BasicAuth
- `docs/operations/serve-traefik-example.yaml` - Traefik config with BasicAuth
- Both configs validated for syntax and structure:
- nginx: server block, location blocks, proxy_pass, auth_basic, ssl_certificate all present
- Traefik: http section with routers, services, middlewares all present
**PASS** - CLI help reflects the security model
- Location: `crates/pdftract-cli/src/main.rs:220-250`
- Documents: no built-in auth, deploy behind reverse proxy, multipart-only upload model
- Also includes concurrency model documentation
### Implementation Details
#### CLI Flags
- `--max-upload-mb`: Maximum request body size (default: 256 MB, hard cap: 4096 MB)
- `--max-decompress-gb`: Maximum decompression size (default: 1 GB)
- `--bind`: Bind address with validation warning for 0.0.0.0
#### Bind Address Validation
- Location: `crates/pdftract-cli/src/main.rs:1626-1631`
- Warns if binding to `0.0.0.0` or `[::]`:
```
*** WARNING: Binding to 0.0.0.0:8080 exposes pdftract serve on ALL interfaces.
*** pdftract serve has NO BUILT-IN AUTHENTICATION.
*** Deploy behind a reverse proxy (nginx, Traefik, Caddy) for production use.
```
#### Request Size Limit
- Implemented via `tower-http::RequestBodyLimit` (imported) and `axum::extract::DefaultBodyLimit`
- Custom rejection handler converts tower-http's plain-text 413 to JSON error body
- Error format: `{"error":"REQUEST_TOO_LARGE","message":"Request body exceeds the configured limit"}`
#### Decompression Limit
- Server default from `--max-decompress-gb` CLI flag
- Per-request override via `max_decompress_gb` form field
- Hard cap of 4096 GB enforced in `build_options()`
- Converts GB to bytes: `(gb as u64) * (1 << 30)`
### Fixes Made
Fixed test compilation error in `crates/pdftract-cli/src/serve.rs:1354`:
- Added missing `pages` field to `ExtractParams` test initialization
- Changed: `pages: Some("1-5".to_string())`
Fixed test compilation errors in `crates/pdftract-core/tests/struct_tree_coverage.rs`:
- Added missing `max_decompress_bytes`, `output`, and `pages` fields to `ExtractionOptions` initializations
- Used `Default::default()` for `output` field
## Test Evidence
1. **Startup banner**: Verified in serve.rs lines 462-478
2. **No file-path parameters**: Verified by design - no routes accept paths
3. **Decompression limit**: TH-01 test exists at `crates/pdftract-core/tests/TH-01-stream-bomb.rs`
4. **Sample configs**: Validated for syntax (nginx) and structure (Traefik)
5. **CLI help**: Verified security model documentation in main.rs
## References
- Plan section: Phase 6.4 security constraints (lines 2136-2139)
- Security Hardening epic: pdftract-bgj