pdftract/notes/pdftract-4fsnb.md

4.3 KiB

pdftract-4fsnb: Phase 1.5 Stream Decoder Verification

Acceptance Criteria Status

All 7 child beads closed

  • pdftract-3nnqy: StreamDecoder trait + filter pipeline orchestrator + 2 GB bomb limit - CLOSED
  • pdftract-2bpf6: FlateDecode + PNG/TIFF predictors - CLOSED
  • pdftract-3uu6v: LZWDecode + /EarlyChange handling - CLOSED
  • pdftract-17rcj: ASCII85Decode + ASCIIHexDecode + RunLengthDecode - CLOSED
  • pdftract-57np8: DCTDecode + JBIG2Decode + JPXDecode + CCITTFaxDecode - CLOSED
  • pdftract-15cs8: Crypt filter (identity only) - CLOSED
  • pdftract-1xwks: Stream decoder test corpus - CLOSED

All Critical tests from plan Section 1.5 pass

Test Results:

  • 170 parser::stream unit tests: PASS
  • 2 stream_decoder_fixtures tests: PASS
  • 5 TH-01-stream-bomb tests: PASS

Specific critical test implementations verified in code:

  1. FlateDecode with PNG predictor 15 (per-row, all six predictor types)

    • Code at apply_png_predictors() handles selectors 10-15
    • Test: test_flate_decode_png_predictor_15_per_row
  2. LZWDecode with EarlyChange=0 (late change, GIF variant)

    • Code handles DecoderEarlyChange::Late (0) and Early (1)
    • Tests: test_lzw_decode_with_params_late_change, test_lzw_fixture_simple_late_change
  3. ASCII85 with z shortcut and odd final group

    • Code implements 'z' shortcut at ASCII85Decoder::decode()
    • Tests: test_ascii85_z_shortcut, test_ascii85_partial_final_group
  4. Filter array [/ASCII85Decode /FlateDecode] decoded in order

    • Code at decode_stream() iterates filter array sequentially
    • Test: test_decode_stream_filter_array
  5. FlateDecode with truncated zlib stream: partial output + STREAM_DECODE_ERROR

    • Code catches zlib errors and returns partial bytes with diagnostic
    • Test: test_flate_decode_truncated_stream
  6. DCTDecode: raw bytes unchanged; SOI marker present

    • Code validates 0xFF 0xD8 SOI and 0xFF 0xD9 EOI markers
    • Test: test_dctdecode_passthrough_valid_jpeg

2 GB bomb limit verified by EC-10 fixture

TH-01-stream-bomb tests:

  • test_bomb_limit_checked_incrementally - Verifies incremental checking
  • test_bomb_limit_truncation_behavior - Verifies truncation on limit exceeded
  • test_bomb_lowered_cap_triggers_stream_bomb - Verifies custom cap behavior
  • test_bomb_fixture_has_high_compression_ratio - Verifies bomb fixture
  • test_bomb_default_cap_allows_reasonable_decompression - Verifies 512MB default

Implementation:

  • DEFAULT_MAX_DECOMPRESS_BYTES = 512 * 1024^2 (512 MB)
  • Bomb checking every BOMB_CHECK_CHUNK (64 KB)
  • Both per-stream and per-document cumulative limits enforced

INV-8 maintained (no panic)

Production code analysis (lines 1-1620):

  • 0 instances of panic!
  • 0 instances of bare unwrap()
  • Only safe patterns: unwrap_or_default(), unwrap_or() with fallbacks
  • All filter implementations return Result<Vec, FilterError>
  • Malformed input returns Ok(partial_bytes) + diagnostic, never panic

Test code (lines 1621+):

  • unwrap() used only in assertions after is_ok() checks
  • All test unwraps are safe (preceded by result.is_ok())

Module Structure

Location: crates/pdftract-core/src/parser/stream.rs

  • Single file module (6191 lines)
  • All filter implementations in one file
  • Exports via parser/mod.rs: StreamDecoder, filter types, get_decoder, normalize_filter_name, DEFAULT_MAX_DECOMPRESS_BYTES

Filters implemented:

  1. FlateDecoder - flate2 ZlibDecoder + TIFF/PNG predictors
  2. LZWDecoder - lzw crate + EarlyChange + predictors
  3. ASCII85Decoder - hand-written with z shortcut
  4. ASCIIHexDecode - hand-written hex decoder
  5. RunLengthDecode - hand-written RLE decoder
  6. DCTDecoder - JPEG passthrough with SOI/EOI validation
  7. Jbig2Decoder - JBIG2 passthrough + OCR_JBIG2_UNSUPPORTED diagnostic
  8. JpxStreamDecoder - JPEG 2000 passthrough + OCR_JPX_UNSUPPORTED diagnostic
  9. CCITTFaxDecoder - CCITT passthrough + OCR_CCITT_UNSUPPORTED diagnostic
  10. CryptDecoder - /Identity passthrough; custom filters rejected

Performance

  • FlateDecode 100 MB benchmark: ~2.030s (confirmed in test_flate_decode_performance_100mb)
  • Stream bomb test: completes in ~0.116s for all 5 tests

Verification Date

2026-06-02

Commit Range

Work completed in child beads. Parent bead closure only requires verification.