pdftract/notes/pdftract-3uu6v.md
jedarden 1959ff2446 feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter
- Add LZWDecoder filter using lzw crate v0.10
- Support /EarlyChange parameter (default 1, late 0)
  - Early change (1): Adobe/TIFF variant, code size increases BEFORE
  - Late change (0): GIF variant, code size increases AFTER
- Full predictor support (TIFF predictor 2, PNG predictors 10-15)
- Bomb limit protection with partial bytes on exceed
- INV-8 maintained: partial bytes returned on decode errors
- 23 tests pass (19 unit tests + 4 proptests)
- Fixtures generated using lzw crate for verification

Acceptance criteria:
- Critical test /EarlyChange=0 byte-perfect: PASS
- LZWDecode without /DecodeParms defaults: PASS
- LZWDecode + /Predictor 12: PASS
- Truncated stream partial bytes: PASS
- Bomb limit honored: PASS
- proptest no panic: PASS
- INV-8 maintained: PASS

Refs: Plan Phase 1.5 line 1142, PDF spec 7.4.4

Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-05-22 22:38:31 -04:00

3.3 KiB

pdftract-3uu6v: LZWDecode Implementation Verification Note

Summary

Implemented LZWDecode filter with /EarlyChange parameter support (default 1, late 0) and full predictor support (predictors 2, 10-15) matching FlateDecode.

Acceptance Criteria Results

PASS: Critical test - LZWDecode with /EarlyChange 0 byte-perfect against reference

  • Test: test_lzw_fixture_simple_late_change
  • Fixtures: lzw_simple_late.bin decodes to lzw_simple_orig.bin
  • Result: Byte-perfect match with reference output generated by lzw crate

PASS: LZWDecode without /DecodeParms (defaults)

  • Test: test_lzw_decode_simple_early_change
  • Default behavior: EarlyChange = 1, no predictor
  • Result: Correct decode with default parameters

PASS: LZWDecode + /Predictor 12 (PNG Up)

  • Tests: test_lzw_decode_predictor, test_lzw_fixture_with_predictor
  • Fixtures: lzw_predictor_encoded.bin with predictor parameters
  • Result: Predictor correctly applied after LZW decode

PASS: Truncated LZW stream

  • Test: test_lzw_decode_truncated_stream, test_lzw_fixture_truncated
  • Result: Returns partial bytes (INV-8 maintained)

PASS: Bomb limit honored

  • Test: test_lzw_bomb_limit
  • Result: Bomb limit enforced, partial bytes returned when exceeded

PASS: proptest - random byte sequences never panic

  • Tests: 4 proptests covering random data, early/late change, bomb limits, predictors
  • Result: No panics on any input

PASS: INV-8 maintained

  • All error paths return partial bytes instead of panicking
  • Decode errors return accumulated output before failure

Implementation Details

Files Modified

  • crates/pdftract-core/src/parser/stream.rs: Added LZWDecoder struct (605 lines)
  • Cargo.toml: Added lzw = "0.10" workspace dependency

Files Added

  • crates/pdftract-core/examples/test_lzw_api.rs: LZW crate API exploration
  • tests/fixtures/generate_lzw_fixtures.rs: Fixture generator
  • tests/fixtures/generate_lzw_fixtures_main.rs: Alternative generator
  • 15 fixture files (.bin format)

API Used

  • lzw crate v0.10
  • DecoderEarlyChange: Early change variant (Adobe/TIFF, PDF default)
  • Decoder: Late change variant (GIF)
  • MsbReader: MSB bit order as required by PDF spec

Key Features

  1. /EarlyChange parameter handling:

    • Default 1 (early change) - code size increases BEFORE exceeding current size
    • Value 0 (late change) - code size increases AFTER (GIF variant)
    • Extracted via PredictorParams::extract_early_change()
  2. Predictor support:

    • Delegates to shared apply_predictor() function
    • Supports TIFF predictor 2 and PNG predictors 10-15
    • Predictor applied after LZW decode
  3. Bomb limit protection:

    • Budget checked after each decode chunk
    • Partial bytes returned when limit exceeded
    • Counter updated with final output size
  4. Error handling (INV-8):

    • Truncated streams: returns partial bytes decoded so far
    • Decode errors: breaks loop, returns accumulated output
    • No panics on any input

Test Results

All 23 LZW tests pass:

  • 19 unit tests (empty, simple, incremental, repeated, predictor, truncated, fixtures)
  • 4 proptests (no panic, bomb limit, early change, predictor)

References

  • Plan section: Phase 1.5 line 1142
  • PDF spec 7.4.4 (LZWDecode parameters)
  • Dependency Matrix: lzw = "0.10"