diff --git a/notes/bf-4xk2v.md b/notes/bf-4xk2v.md new file mode 100644 index 0000000..64579eb --- /dev/null +++ b/notes/bf-4xk2v.md @@ -0,0 +1,108 @@ +# bf-4xk2v: Bound decompression-bomb tests — assert abort before materialization + +## Summary + +Fixed decompression-bomb and max_decompress_bytes tests to trigger STREAM_BOMB +abort WITHOUT building multi-GB decoded outputs in memory. All tests now use +minimal crafted inputs and assert the byte-budget limit fires early. + +## Changes Made + +### 1. Fixed `test_bomb_limit_flate` (line 1117) +**Before:** Used "hello" compressed (5 bytes), not a real bomb test +**After:** Proper bomb test using minimal crafted input with clear documentation +- Uses small compressed payload that would expand beyond bomb limit +- Asserts output.len() <= bomb_limit +- Documents the TH-01 requirement + +### 2. Fixed `test_flate_decode_bomb_limit` (line 2177) +**Before:** Created `vec![0u8; 1MB]` first - violates "never pre-size Vec" +**After:** Uses fixture file or minimal inline payload +- Falls back to 200-byte pattern if fixture unavailable +- Never creates multi-MB buffers +- Bomb limit of 100 bytes forces early abort +- Includes fixture loading logic for compression-bomb.bin + +### 3. Fixed `test_document_level_bomb_limit` (line 2227) +**Before:** Created `vec![0u8; 500KB]` for each stream +**After:** Uses 200-byte pattern +- Total budget 150 bytes forces truncation on first stream +- Never creates large buffers + +### 4. Fixed `test_flate_decode_bomb_limit_with_predictor` (line 2954) +**Before:** Created 6000-byte buffer with loop +**After:** Uses 150-byte pattern (25 rows × 6 bytes) +- Bomb limit 50 bytes forces early abort +- Verifies predictor doesn't bypass bomb checks + +### 5. Added `test_th01_decompression_bomb_abort` (line 2397) +**New test** implementing TH-01 from plan: +- Uses compression-bomb.bin fixture (509 bytes → 500 KB, 982:1 ratio) +- Bomb limit 100 KB forces abort before materializing full 500 KB +- Critical assertions: + - `decoded.len() <= bomb_limit` + - `decoded.len() < 400000` (not full output) + - Clear failure messages if bomb check doesn't fire early + +### 6. Created fixture file +**File:** `tests/fixtures/malformed/compression-bomb.bin` +- 509 bytes compressed → 500 KB decompressed +- 982:1 compression ratio using repeated "AB" pattern +- Created with Python script to avoid large buffers in Rust code + +## Acceptance Criteria + +| Criterion | Status | Notes | +|-----------|--------|-------| +| STREAM_BOMB abort fires before materialization | PASS | All tests use small inputs with low bomb limits | +| Minimal crafted inputs (no multi-GB buffers) | PASS | Max buffer created is 200 bytes for patterns | +| Byte-budget limit fires early | PASS | Bomb limits set well below decoded sizes | +| Never pre-size Vec in tests | PASS | All tests use small patterns or fixtures | +| TH-01 bomb-abort test exists | PASS | New test using compression-bomb.bin fixture | + +## Test Results + +All 13 bomb-related tests pass: +- test_bomb_limit_flate +- test_flate_decode_bomb_limit +- test_document_level_bomb_limit +- test_flate_decode_bomb_limit_with_predictor +- test_th01_decompression_bomb_abort +- test_lzw_bomb_limit +- test_crypt_decode_bomb_limit +- test_decompression_bomb_objstm +- test_bomb_limit_enforcement +- proptest_flate_decode_bomb_limit_no_panic +- proptest_lzw_decode_bomb_limit_no_panic +- proptest_crypt_decode_bomb_limit_no_panic +- test_bomb_protection_detection + +## Verification + +```bash +# Run all bomb tests +cargo test -p pdftract-core --lib bomb + +# Run specific tests +cargo test -p pdftract-core --lib test_th01_decompression_bomb_abort +cargo test -p pdftract-core --lib test_bomb_limit_flate +cargo test -p pdftract-core --lib test_flate_decode_bomb_limit +``` + +## Files Modified + +- `crates/pdftract-core/src/parser/stream.rs` - Fixed 4 tests, added 1 new test +- `tests/fixtures/malformed/compression-bomb.bin` - New fixture file (509 bytes) + +## Key Implementation Notes + +1. **Minimal inputs:** All tests use small patterns (50-200 bytes) that compress well +2. **Early abort:** Bomb limits set to 1/3 or less of decoded size to force truncation +3. **Fixture-based:** TH-01 test uses pre-compressed fixture to avoid creating large buffers +4. **Clear assertions:** Each test explicitly checks `decoded.len() <= bomb_limit` + +## References + +- Plan EC-10: FlateDecode bomb mitigation +- Plan TH-01: Decompression bomb threat and test +- Bead requirement: "Use minimal crafted inputs and assert the byte-budget limit fires early" diff --git a/tests/fixtures/malformed/compression-bomb.bin b/tests/fixtures/malformed/compression-bomb.bin new file mode 100644 index 0000000..573f983 Binary files /dev/null and b/tests/fixtures/malformed/compression-bomb.bin differ