- Add LZWDecoder filter using lzw crate v0.10 - Support /EarlyChange parameter (default 1, late 0) - Early change (1): Adobe/TIFF variant, code size increases BEFORE - Late change (0): GIF variant, code size increases AFTER - Full predictor support (TIFF predictor 2, PNG predictors 10-15) - Bomb limit protection with partial bytes on exceed - INV-8 maintained: partial bytes returned on decode errors - 23 tests pass (19 unit tests + 4 proptests) - Fixtures generated using lzw crate for verification Acceptance criteria: - Critical test /EarlyChange=0 byte-perfect: PASS - LZWDecode without /DecodeParms defaults: PASS - LZWDecode + /Predictor 12: PASS - Truncated stream partial bytes: PASS - Bomb limit honored: PASS - proptest no panic: PASS - INV-8 maintained: PASS Refs: Plan Phase 1.5 line 1142, PDF spec 7.4.4 Co-Authored-By: Claude Code <noreply@anthropic.com>
3.3 KiB
3.3 KiB
pdftract-3uu6v: LZWDecode Implementation Verification Note
Summary
Implemented LZWDecode filter with /EarlyChange parameter support (default 1, late 0) and full predictor support (predictors 2, 10-15) matching FlateDecode.
Acceptance Criteria Results
PASS: Critical test - LZWDecode with /EarlyChange 0 byte-perfect against reference
- Test:
test_lzw_fixture_simple_late_change - Fixtures:
lzw_simple_late.bindecodes tolzw_simple_orig.bin - Result: Byte-perfect match with reference output generated by lzw crate
PASS: LZWDecode without /DecodeParms (defaults)
- Test:
test_lzw_decode_simple_early_change - Default behavior: EarlyChange = 1, no predictor
- Result: Correct decode with default parameters
PASS: LZWDecode + /Predictor 12 (PNG Up)
- Tests:
test_lzw_decode_predictor,test_lzw_fixture_with_predictor - Fixtures:
lzw_predictor_encoded.binwith predictor parameters - Result: Predictor correctly applied after LZW decode
PASS: Truncated LZW stream
- Test:
test_lzw_decode_truncated_stream,test_lzw_fixture_truncated - Result: Returns partial bytes (INV-8 maintained)
PASS: Bomb limit honored
- Test:
test_lzw_bomb_limit - Result: Bomb limit enforced, partial bytes returned when exceeded
PASS: proptest - random byte sequences never panic
- Tests: 4 proptests covering random data, early/late change, bomb limits, predictors
- Result: No panics on any input
PASS: INV-8 maintained
- All error paths return partial bytes instead of panicking
- Decode errors return accumulated output before failure
Implementation Details
Files Modified
crates/pdftract-core/src/parser/stream.rs: Added LZWDecoder struct (605 lines)Cargo.toml: Addedlzw = "0.10"workspace dependency
Files Added
crates/pdftract-core/examples/test_lzw_api.rs: LZW crate API explorationtests/fixtures/generate_lzw_fixtures.rs: Fixture generatortests/fixtures/generate_lzw_fixtures_main.rs: Alternative generator- 15 fixture files (.bin format)
API Used
lzwcrate v0.10DecoderEarlyChange: Early change variant (Adobe/TIFF, PDF default)Decoder: Late change variant (GIF)MsbReader: MSB bit order as required by PDF spec
Key Features
-
/EarlyChange parameter handling:
- Default 1 (early change) - code size increases BEFORE exceeding current size
- Value 0 (late change) - code size increases AFTER (GIF variant)
- Extracted via
PredictorParams::extract_early_change()
-
Predictor support:
- Delegates to shared
apply_predictor()function - Supports TIFF predictor 2 and PNG predictors 10-15
- Predictor applied after LZW decode
- Delegates to shared
-
Bomb limit protection:
- Budget checked after each decode chunk
- Partial bytes returned when limit exceeded
- Counter updated with final output size
-
Error handling (INV-8):
- Truncated streams: returns partial bytes decoded so far
- Decode errors: breaks loop, returns accumulated output
- No panics on any input
Test Results
All 23 LZW tests pass:
- 19 unit tests (empty, simple, incremental, repeated, predictor, truncated, fixtures)
- 4 proptests (no panic, bomb limit, early change, predictor)
References
- Plan section: Phase 1.5 line 1142
- PDF spec 7.4.4 (LZWDecode parameters)
- Dependency Matrix: lzw = "0.10"