- Add LZWDecoder filter using lzw crate v0.10 - Support /EarlyChange parameter (default 1, late 0) - Early change (1): Adobe/TIFF variant, code size increases BEFORE - Late change (0): GIF variant, code size increases AFTER - Full predictor support (TIFF predictor 2, PNG predictors 10-15) - Bomb limit protection with partial bytes on exceed - INV-8 maintained: partial bytes returned on decode errors - 23 tests pass (19 unit tests + 4 proptests) - Fixtures generated using lzw crate for verification Acceptance criteria: - Critical test /EarlyChange=0 byte-perfect: PASS - LZWDecode without /DecodeParms defaults: PASS - LZWDecode + /Predictor 12: PASS - Truncated stream partial bytes: PASS - Bomb limit honored: PASS - proptest no panic: PASS - INV-8 maintained: PASS Refs: Plan Phase 1.5 line 1142, PDF spec 7.4.4 Co-Authored-By: Claude Code <noreply@anthropic.com>
86 lines
3.3 KiB
Markdown
86 lines
3.3 KiB
Markdown
# pdftract-3uu6v: LZWDecode Implementation Verification Note
|
|
|
|
## Summary
|
|
Implemented LZWDecode filter with /EarlyChange parameter support (default 1, late 0) and full predictor support (predictors 2, 10-15) matching FlateDecode.
|
|
|
|
## Acceptance Criteria Results
|
|
|
|
### PASS: Critical test - LZWDecode with /EarlyChange 0 byte-perfect against reference
|
|
- Test: `test_lzw_fixture_simple_late_change`
|
|
- Fixtures: `lzw_simple_late.bin` decodes to `lzw_simple_orig.bin`
|
|
- Result: Byte-perfect match with reference output generated by lzw crate
|
|
|
|
### PASS: LZWDecode without /DecodeParms (defaults)
|
|
- Test: `test_lzw_decode_simple_early_change`
|
|
- Default behavior: EarlyChange = 1, no predictor
|
|
- Result: Correct decode with default parameters
|
|
|
|
### PASS: LZWDecode + /Predictor 12 (PNG Up)
|
|
- Tests: `test_lzw_decode_predictor`, `test_lzw_fixture_with_predictor`
|
|
- Fixtures: `lzw_predictor_encoded.bin` with predictor parameters
|
|
- Result: Predictor correctly applied after LZW decode
|
|
|
|
### PASS: Truncated LZW stream
|
|
- Test: `test_lzw_decode_truncated_stream`, `test_lzw_fixture_truncated`
|
|
- Result: Returns partial bytes (INV-8 maintained)
|
|
|
|
### PASS: Bomb limit honored
|
|
- Test: `test_lzw_bomb_limit`
|
|
- Result: Bomb limit enforced, partial bytes returned when exceeded
|
|
|
|
### PASS: proptest - random byte sequences never panic
|
|
- Tests: 4 proptests covering random data, early/late change, bomb limits, predictors
|
|
- Result: No panics on any input
|
|
|
|
### PASS: INV-8 maintained
|
|
- All error paths return partial bytes instead of panicking
|
|
- Decode errors return accumulated output before failure
|
|
|
|
## Implementation Details
|
|
|
|
### Files Modified
|
|
- `crates/pdftract-core/src/parser/stream.rs`: Added LZWDecoder struct (605 lines)
|
|
- `Cargo.toml`: Added `lzw = "0.10"` workspace dependency
|
|
|
|
### Files Added
|
|
- `crates/pdftract-core/examples/test_lzw_api.rs`: LZW crate API exploration
|
|
- `tests/fixtures/generate_lzw_fixtures.rs`: Fixture generator
|
|
- `tests/fixtures/generate_lzw_fixtures_main.rs`: Alternative generator
|
|
- 15 fixture files (.bin format)
|
|
|
|
### API Used
|
|
- `lzw` crate v0.10
|
|
- `DecoderEarlyChange`: Early change variant (Adobe/TIFF, PDF default)
|
|
- `Decoder`: Late change variant (GIF)
|
|
- `MsbReader`: MSB bit order as required by PDF spec
|
|
|
|
### Key Features
|
|
1. **/EarlyChange parameter handling**:
|
|
- Default 1 (early change) - code size increases BEFORE exceeding current size
|
|
- Value 0 (late change) - code size increases AFTER (GIF variant)
|
|
- Extracted via `PredictorParams::extract_early_change()`
|
|
|
|
2. **Predictor support**:
|
|
- Delegates to shared `apply_predictor()` function
|
|
- Supports TIFF predictor 2 and PNG predictors 10-15
|
|
- Predictor applied after LZW decode
|
|
|
|
3. **Bomb limit protection**:
|
|
- Budget checked after each decode chunk
|
|
- Partial bytes returned when limit exceeded
|
|
- Counter updated with final output size
|
|
|
|
4. **Error handling (INV-8)**:
|
|
- Truncated streams: returns partial bytes decoded so far
|
|
- Decode errors: breaks loop, returns accumulated output
|
|
- No panics on any input
|
|
|
|
## Test Results
|
|
All 23 LZW tests pass:
|
|
- 19 unit tests (empty, simple, incremental, repeated, predictor, truncated, fixtures)
|
|
- 4 proptests (no panic, bomb limit, early change, predictor)
|
|
|
|
## References
|
|
- Plan section: Phase 1.5 line 1142
|
|
- PDF spec 7.4.4 (LZWDecode parameters)
|
|
- Dependency Matrix: lzw = "0.10"
|