pdftract/notes/pdftract-3uu6v.md
jedarden 1959ff2446 feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter
- Add LZWDecoder filter using lzw crate v0.10
- Support /EarlyChange parameter (default 1, late 0)
  - Early change (1): Adobe/TIFF variant, code size increases BEFORE
  - Late change (0): GIF variant, code size increases AFTER
- Full predictor support (TIFF predictor 2, PNG predictors 10-15)
- Bomb limit protection with partial bytes on exceed
- INV-8 maintained: partial bytes returned on decode errors
- 23 tests pass (19 unit tests + 4 proptests)
- Fixtures generated using lzw crate for verification

Acceptance criteria:
- Critical test /EarlyChange=0 byte-perfect: PASS
- LZWDecode without /DecodeParms defaults: PASS
- LZWDecode + /Predictor 12: PASS
- Truncated stream partial bytes: PASS
- Bomb limit honored: PASS
- proptest no panic: PASS
- INV-8 maintained: PASS

Refs: Plan Phase 1.5 line 1142, PDF spec 7.4.4

Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-05-22 22:38:31 -04:00

86 lines
3.3 KiB
Markdown

# pdftract-3uu6v: LZWDecode Implementation Verification Note
## Summary
Implemented LZWDecode filter with /EarlyChange parameter support (default 1, late 0) and full predictor support (predictors 2, 10-15) matching FlateDecode.
## Acceptance Criteria Results
### PASS: Critical test - LZWDecode with /EarlyChange 0 byte-perfect against reference
- Test: `test_lzw_fixture_simple_late_change`
- Fixtures: `lzw_simple_late.bin` decodes to `lzw_simple_orig.bin`
- Result: Byte-perfect match with reference output generated by lzw crate
### PASS: LZWDecode without /DecodeParms (defaults)
- Test: `test_lzw_decode_simple_early_change`
- Default behavior: EarlyChange = 1, no predictor
- Result: Correct decode with default parameters
### PASS: LZWDecode + /Predictor 12 (PNG Up)
- Tests: `test_lzw_decode_predictor`, `test_lzw_fixture_with_predictor`
- Fixtures: `lzw_predictor_encoded.bin` with predictor parameters
- Result: Predictor correctly applied after LZW decode
### PASS: Truncated LZW stream
- Test: `test_lzw_decode_truncated_stream`, `test_lzw_fixture_truncated`
- Result: Returns partial bytes (INV-8 maintained)
### PASS: Bomb limit honored
- Test: `test_lzw_bomb_limit`
- Result: Bomb limit enforced, partial bytes returned when exceeded
### PASS: proptest - random byte sequences never panic
- Tests: 4 proptests covering random data, early/late change, bomb limits, predictors
- Result: No panics on any input
### PASS: INV-8 maintained
- All error paths return partial bytes instead of panicking
- Decode errors return accumulated output before failure
## Implementation Details
### Files Modified
- `crates/pdftract-core/src/parser/stream.rs`: Added LZWDecoder struct (605 lines)
- `Cargo.toml`: Added `lzw = "0.10"` workspace dependency
### Files Added
- `crates/pdftract-core/examples/test_lzw_api.rs`: LZW crate API exploration
- `tests/fixtures/generate_lzw_fixtures.rs`: Fixture generator
- `tests/fixtures/generate_lzw_fixtures_main.rs`: Alternative generator
- 15 fixture files (.bin format)
### API Used
- `lzw` crate v0.10
- `DecoderEarlyChange`: Early change variant (Adobe/TIFF, PDF default)
- `Decoder`: Late change variant (GIF)
- `MsbReader`: MSB bit order as required by PDF spec
### Key Features
1. **/EarlyChange parameter handling**:
- Default 1 (early change) - code size increases BEFORE exceeding current size
- Value 0 (late change) - code size increases AFTER (GIF variant)
- Extracted via `PredictorParams::extract_early_change()`
2. **Predictor support**:
- Delegates to shared `apply_predictor()` function
- Supports TIFF predictor 2 and PNG predictors 10-15
- Predictor applied after LZW decode
3. **Bomb limit protection**:
- Budget checked after each decode chunk
- Partial bytes returned when limit exceeded
- Counter updated with final output size
4. **Error handling (INV-8)**:
- Truncated streams: returns partial bytes decoded so far
- Decode errors: breaks loop, returns accumulated output
- No panics on any input
## Test Results
All 23 LZW tests pass:
- 19 unit tests (empty, simple, incremental, repeated, predictor, truncated, fixtures)
- 4 proptests (no panic, bomb limit, early change, predictor)
## References
- Plan section: Phase 1.5 line 1142
- PDF spec 7.4.4 (LZWDecode parameters)
- Dependency Matrix: lzw = "0.10"