The parse_indirect_object() function was already implemented in
crates/pdftract-core/src/parser/object/parser.rs with all required
functionality:
- Reads 3-token preamble (Integer Integer Obj)
- Parses direct object body
- Expects EndObj token
- Returns PdfIndirect { id, obj }
All acceptance criteria PASS:
- Simple null object test ✅
- Stream object test ✅
- Missing endobj recovery ✅
- Integer overflow clamping ✅
- proptest: random bytes never panic ✅
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3.4 KiB
3.4 KiB
pdftract-4ymy: Indirect Object Wrapper Parser Implementation
Summary
Implement ObjectParser::parse_indirect_object() which reads the four-token preamble (Integer Integer Obj), parses one direct object, expects Token::EndObj, and returns PdfIndirect { id: ObjRef, obj: PdfObject }.
Implementation Details
The implementation was already present in crates/pdftract-core/src/parser/object/parser.rs (lines 413-660). The function:
- Reads 3 tokens for the header:
Integer(N),Integer(G),Token::Obj - Validates and constructs
ObjRef: With overflow handling for both object number (clamps tou32::MAX) and generation number (clamps tou16::MAX) - Parses the direct object body via
parse_direct_object() - Expects
Token::EndObj: With comprehensive error recovery - Returns
PdfIndirect { id, obj }
Error Recovery
- Invalid header (e.g.,
1 X obj): EmitsSTRUCT_INVALID_INDIRECT_HEADER, scans forward to the nextobjkeyword - Missing
endobj: EmitsSTRUCT_MISSING_KEY, scans forward to the nextendobj,obj, or EOF - Integer overflow: Emits
STRUCT_INTEGER_OVERFLOW, clamps to max value - Multi-object skip recovery: If scanning for
endobjfindsobjfirst (start of next indirect object), scans backward to find the preceding integer (object number)
Position Tracking
The lexer's position counter is valid on all return paths (both success and recovery), ensuring the xref resolver can correctly track object positions.
Acceptance Criteria Status
| Criteria | Status | Test |
|---|---|---|
Simple test: 1 0 obj null endobj → PdfIndirect{ ObjRef{1,0}, Null } |
✅ PASS | test_parse_indirect_object_simple |
Stream test: 12 0 obj << /Length 5 >> stream\n12345endstream endobj → PdfIndirect with Stream |
✅ PASS | test_parse_indirect_object_with_stream |
Recovery: 1 0 obj null (no endobj) → emit STRUCT_MISSING_KEY, position advances |
✅ PASS | test_parse_indirect_object_missing_endobj |
Recovery: 999999999999 0 obj null endobj → ObjRef{u32::MAX, 0} + STRUCT_INTEGER_OVERFLOW |
✅ PASS | test_parse_indirect_object_integer_overflow |
| proptest: random byte sequences never panic | ✅ PASS | proptest_random_bytes_no_panic_indirect |
Test Results
All 11 indirect object tests pass:
test_parse_indirect_object_simple✅test_parse_indirect_object_with_integer✅test_parse_indirect_object_with_stream✅test_parse_indirect_object_missing_endobj✅test_parse_indirect_object_integer_overflow✅test_parse_indirect_object_generation_overflow✅test_parse_indirect_object_invalid_header✅test_parse_indirect_object_negative_object_number✅test_parse_indirect_object_eof_returns_none✅test_parse_indirect_object_with_dict✅test_parse_indirect_object_with_array✅
Property-based test:
proptest_random_bytes_no_panic_indirect✅
References
- Plan section: Phase 1.2 line 1071 (indirect object parsing)
- Phase 1.6 (error recovery for missing endobj)
- INV-8 (no panics at public boundaries)
Files Modified
No files were modified - the implementation was already present and complete.
Verification
Run tests with:
cargo test --package pdftract-core --lib parser::object::parser::tests::test_parse_indirect_object
cargo test --package pdftract-core --lib proptest_random_bytes_no_panic_indirect