# Verification Note: pdftract-3nnqy

## Work Completed

Implemented the StreamDecoder trait, filter pipeline orchestrator, and max_decompress_bytes bomb limit for PDF stream decoding.

## Components Implemented

### 1. StreamDecoder Trait (`crates/pdftract-core/src/parser/stream.rs`)
- Trait with `decode()` method for filter-specific decoding
- Per-filter implementations:
  - `FlateDecoder`: zlib/deflate decompression with bomb limit checking
  - `ASCII85Decoder`: Base85 decoding with bomb limit checking
  - `ASCIIHexDecoder`: Hexadecimal decoding
  - `PassthroughDecoder`: For unsupported filters (DCTDecode, JBIG2Decode, etc.)

### 2. Filter Pipeline (`decode_stream()`)
- Single filter handling: `/Filter /FlateDecode`
- Array filter handling: `/Filter [/ASCII85Decode /FlateDecode]`
- /DecodeParms pairing with /Filter arrays
- Filter abbreviation normalization (/A85 → ASCII85Decode, /Fl → FlateDecode, etc.)
- Unknown filter handling: returns raw bytes with STRUCT_UNKNOWN_FILTER diagnostic

### 3. Bomb Limit Protection
- `ExtractionOptions` struct with `max_decompress_bytes` field (default: 2 GB)
- Document-level counter tracking across all stream decodes
- Per-stream bomb limit checking
- Chunked decoding (64 KB chunks) to enforce limit mid-stream
- STREAM_BOMB diagnostic when limit exceeded

### 4. Supporting Types
- `PdfSource` trait for abstracted byte reading
- `MemorySource` implementation for in-memory data
- `FileSource` implementation for file-backed data
- `FilterError` enum for hard errors (unknown filter, invalid params)
- `DecodeResult` struct for bytes + diagnostics

## Acceptance Criteria Status

| Criterion | Status | Notes |
|-----------|--------|-------|
| decode_stream() handles single-filter and array-filter cases | PASS | Tested with `test_decode_stream_single_filter` and `test_decode_stream_filter_array` |
| /DecodeParms array correctly paired with /Filter array | PASS | Implementation validates array lengths match |
| Critical test: [/ASCII85Decode /FlateDecode] applies filters in correct order | PASS | Filter array test verifies left-to-right application |
| Filter abbreviations normalized: /A85 routes to ASCII85Decode | PASS | `normalize_filter_name()` function + test |
| 2 GB bomb limit: FlateDecode bomb returns ~2 GB + STREAM_BOMB diagnostic | PASS | `test_flate_decode_bomb_limit` creates 1 MB bomb, stops at 500 KB limit |
| Unknown filter: STRUCT_UNKNOWN_FILTER, raw bytes returned | PASS | `test_decode_stream_unknown_filter` verifies passthrough |
| INV-8 maintained (no panics, partial bytes on error) | PASS | All decoders return Ok(partial_bytes) on corrupt data |

## Test Results

All 146 tests pass, including:
- 24 stream-specific tests
- FlateDecode bomb limit test (1 MB compressed → stops at 500 KB limit)
- Document-level bomb limit test (multiple streams share budget)
- Filter array ordering tests
- ASCII85 decoder with 'z' shortcut and partial tuples
- Unknown filter passthrough

## Files Modified

- `crates/pdftract-core/src/parser/stream.rs` - Complete implementation (1119 lines)
- `crates/pdftract-core/src/parser/diagnostic.rs` - Already had required DiagCode variants
- `crates/pdftract-core/src/parser/object/types.rs` - Already had PdfStream methods
- `crates/pdftract-core/src/parser/mod.rs` - Already exported stream module types

## Key Design Decisions

1. **Match-based dispatch** over `phf` map: Simpler, faster, and sufficient for the 8-10 filter types in PDF spec
2. **Bomb limit checking per 64 KB chunk**: Balances performance with protection
3. **Passthrough for unsupported filters**: DCTDecode (JPEG), JBIG2Decode, JPXDecode, CCITTFaxDecode pass raw bytes
4. **Document-level counter**: Passed as `&mut u64` through all decode calls
5. **Per-stream validation**: Each individual stream also checked against limit (prevents single 3 GB stream from bypassing doc limit)

## INV-3 (Deterministic Decoding)

The implementation maintains deterministic decoding for fingerprint stability:
- Same input + same params → byte-identical output
- No random or time-based behavior
- Error recovery produces consistent partial results

## Next Steps

The stream decoding infrastructure is complete. Future work may include:
- LZWDecode implementation (currently passthrough)
- RunLengthDecode implementation (currently passthrough)
- Crypt filter with /Name != Identity
- scan_for_endstream() fallback for streams without /Length