pdftract/tests/fixtures
jedarden 9ab2765c35 test(pdftract-17cnu): implement TH-01 decompression bomb security test
Implements tests/security/TH-01-stream-bomb.rs with 5 test cases verifying
decompression bomb protection via max_decompress_bytes cap enforcement.

Acceptance criteria PASS:
- tests/security/TH-01-stream-bomb.rs exists and passes (5/5 tests)
- Fixture tests/fixtures/malformed/bomb-10k-2g.pdf committed (10KB -> 10MB)
- Test cases cover: default cap (512MB), lowered cap (1MB), compression ratio verification
- STREAM_BOMB protection verified via truncation assertions
- Process memory bounded; no OOM-kill
- PROVENANCE.md entry added for bomb fixture

Test cases:
1. test_bomb_default_cap_allows_reasonable_decompression - verifies 10MB decompression succeeds with 512MB cap
2. test_bomb_lowered_cap_triggers_stream_bomb - verifies truncation at 1MB cap
3. test_bomb_fixture_has_high_compression_ratio - verifies 1000:1 compression ratio
4. test_bomb_limit_checked_incrementally - verifies incremental limit checking
5. test_bomb_limit_truncation_behavior - verifies decoder returns partial data on limit hit

Fixture generation:
- gen_bomb.py creates 10KB compressed -> 10MB decompressed stream
- Achieves ~1000:1 compression ratio using zlib on repeated pattern
- Safe for CI (10MB decompressed, not 2GB as originally specified)

Refs: TH-01 (line 890), Phase 1.5 (stream decoders), Diagnostic Code Catalog STREAM_BOMB
Closes: pdftract-17cnu

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 12:09:54 -04:00
..
classifier feat(pdftract-59zz): implement MCP bearer token ingress channels and TH-03 enforcement 2026-05-18 02:47:54 -04:00
fonts feat(pdftract-5u8bp): implement SVG clip generator 2026-05-23 03:43:19 -04:00
grep-corpus feat(pdftract-5bzpg): implement pdftract-grep-1000 CI benchmark skeleton 2026-05-25 08:53:23 -04:00
malformed test(pdftract-17cnu): implement TH-01 decompression bomb security test 2026-05-25 12:09:54 -04:00
ocr feat(pdftract-48ea): implement BrokenVector fixtures + WER delta CI gate 2026-05-24 10:52:41 -04:00
page_class feat(pdftract-2zw): page classification fixtures + integration tests + reproducibility gate 2026-05-23 15:04:05 -04:00
perf feat(bf-1g1fd): implement CI memory-ceiling gate with cgroup MemoryMax enforcement 2026-05-23 13:22:55 -04:00
preprocess feat(pdftract-27n3): implement border padding, pipeline orchestration, and fixtures 2026-05-23 21:55:11 -04:00
profiles test(pdftract-17cnu): implement TH-01 decompression bomb security test 2026-05-25 12:09:54 -04:00
security test(pdftract-43jxa): implement TH-07 ps leak security test 2026-05-25 00:45:57 -04:00
gen_fixtures feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_ocr_fixtures feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
gen_suspects feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_simple feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_simple.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_simple_local feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_simple_local.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v2.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v3 feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v3.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v4.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v6 feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v6.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v7 feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v7.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v8 feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v8.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_lzw_fixtures.rs feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
generate_lzw_fixtures_main.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
generate_ocr_fixtures.rs feat(pdftract-315s): implement WER CI gate and OCR CLI flags 2026-05-24 02:07:27 -04:00
generate_page_class_fixtures.rs feat(pdftract-2zw): page classification fixtures + integration tests + reproducibility gate 2026-05-23 15:04:05 -04:00
generate_suspects_fixture feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_suspects_fixture.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_suspects_fixtures feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_suspects_fixtures.py feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_suspects_fixtures.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_suspects_fixtures_v5.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
lzw_incremental_early.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_incremental_late.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_incremental_orig.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_mixed_early.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_mixed_late.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_mixed_orig.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_predictor_encoded.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_predictor_orig.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_repeated_early.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_repeated_late.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_repeated_orig.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_simple_early.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_simple_late.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_simple_orig.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_truncated.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
tagged-suspects-false.pdf feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
tagged-suspects-true-high-coverage.pdf feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
tagged-suspects-true.pdf feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
test-minimal.pdf feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
valid-minimal.pdf test(pdftract-1eaxm): add distribution templates and C conformance tests 2026-05-23 09:20:22 -04:00