# pdftract-3uu6v: LZWDecode Implementation Verification Note ## Summary Implemented LZWDecode filter with /EarlyChange parameter support (default 1, late 0) and full predictor support (predictors 2, 10-15) matching FlateDecode. ## Acceptance Criteria Results ### PASS: Critical test - LZWDecode with /EarlyChange 0 byte-perfect against reference - Test: `test_lzw_fixture_simple_late_change` - Fixtures: `lzw_simple_late.bin` decodes to `lzw_simple_orig.bin` - Result: Byte-perfect match with reference output generated by lzw crate ### PASS: LZWDecode without /DecodeParms (defaults) - Test: `test_lzw_decode_simple_early_change` - Default behavior: EarlyChange = 1, no predictor - Result: Correct decode with default parameters ### PASS: LZWDecode + /Predictor 12 (PNG Up) - Tests: `test_lzw_decode_predictor`, `test_lzw_fixture_with_predictor` - Fixtures: `lzw_predictor_encoded.bin` with predictor parameters - Result: Predictor correctly applied after LZW decode ### PASS: Truncated LZW stream - Test: `test_lzw_decode_truncated_stream`, `test_lzw_fixture_truncated` - Result: Returns partial bytes (INV-8 maintained) ### PASS: Bomb limit honored - Test: `test_lzw_bomb_limit` - Result: Bomb limit enforced, partial bytes returned when exceeded ### PASS: proptest - random byte sequences never panic - Tests: 4 proptests covering random data, early/late change, bomb limits, predictors - Result: No panics on any input ### PASS: INV-8 maintained - All error paths return partial bytes instead of panicking - Decode errors return accumulated output before failure ## Implementation Details ### Files Modified - `crates/pdftract-core/src/parser/stream.rs`: Added LZWDecoder struct (605 lines) - `Cargo.toml`: Added `lzw = "0.10"` workspace dependency ### Files Added - `crates/pdftract-core/examples/test_lzw_api.rs`: LZW crate API exploration - `tests/fixtures/generate_lzw_fixtures.rs`: Fixture generator - `tests/fixtures/generate_lzw_fixtures_main.rs`: Alternative generator - 15 fixture files (.bin format) ### API Used - `lzw` crate v0.10 - `DecoderEarlyChange`: Early change variant (Adobe/TIFF, PDF default) - `Decoder`: Late change variant (GIF) - `MsbReader`: MSB bit order as required by PDF spec ### Key Features 1. **/EarlyChange parameter handling**: - Default 1 (early change) - code size increases BEFORE exceeding current size - Value 0 (late change) - code size increases AFTER (GIF variant) - Extracted via `PredictorParams::extract_early_change()` 2. **Predictor support**: - Delegates to shared `apply_predictor()` function - Supports TIFF predictor 2 and PNG predictors 10-15 - Predictor applied after LZW decode 3. **Bomb limit protection**: - Budget checked after each decode chunk - Partial bytes returned when limit exceeded - Counter updated with final output size 4. **Error handling (INV-8)**: - Truncated streams: returns partial bytes decoded so far - Decode errors: breaks loop, returns accumulated output - No panics on any input ## Test Results All 23 LZW tests pass: - 19 unit tests (empty, simple, incremental, repeated, predictor, truncated, fixtures) - 4 proptests (no panic, bomb limit, early change, predictor) ## References - Plan section: Phase 1.5 line 1142 - PDF spec 7.4.4 (LZWDecode parameters) - Dependency Matrix: lzw = "0.10"