test(pdftract-5tmcg): add cycle detection test for page tree flattener
Add test_cycle_detection_in_page_tree to verify that circular references in the /Pages tree are detected and handled gracefully without panicking. The test creates a page tree with a cycle (parent -> child1 -> child2 -> child1) and verifies that the flattener returns the valid pages while pruning the cyclic portion. Acceptance criteria verified: - 3-level /Pages inheritance with MediaBox: PASS - EC-09 missing MediaBox defaults to US Letter: PASS - /Pages tree with cycles detected: PASS - /Rotate value 45 clamped to 0: PASS - Page count validation: PASS - proptest random shapes never panic: PASS - INV-8 no panics on invalid input: PASS Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Bead-Id: pdftract-5tmcg Bead-Id: pdftract-4iier
This commit is contained in:
parent
eec40dad15
commit
f76f3a647b
2 changed files with 1078 additions and 0 deletions
1033
crates/pdftract-core/src/parser/pages.rs
Normal file
1033
crates/pdftract-core/src/parser/pages.rs
Normal file
File diff suppressed because it is too large
Load diff
45
notes/pdftract-5tmcg.md
Normal file
45
notes/pdftract-5tmcg.md
Normal file
|
|
@ -0,0 +1,45 @@
|
|||
# pdftract-5tmcg: Page Tree Flattener with Inherited Attributes
|
||||
|
||||
## Summary
|
||||
|
||||
Implemented page tree flattener with inherited attribute resolution (MediaBox, CropBox, Resources, Rotate) plus content stream concatenation preparation.
|
||||
|
||||
## Implementation
|
||||
|
||||
The `flatten_page_tree` function in `crates/pdftract-core/src/parser/pages.rs` implements:
|
||||
|
||||
1. **Recursive page tree walk** with depth-first traversal
|
||||
2. **Inherited attribute accumulator** tracking MediaBox, CropBox, Resources, Rotate across /Pages ancestors
|
||||
3. **PageDict output** containing all resolved page attributes
|
||||
4. **Error recovery** for malformed files
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Cycle detection**: Uses HashSet<ObjRef> to detect circular references in /Kids arrays
|
||||
- **Depth limiting**: MAX_PAGES_DEPTH = 16 to prevent stack overflow
|
||||
- **EC-09 compliance**: Missing MediaBox defaults to US Letter (612 x 792 points) with STRUCT_MISSING_KEY diagnostic
|
||||
- **Rotate validation**: Non-multiples of 90 are clamped to nearest multiple with STRUCT_INVALID_ROTATE diagnostic
|
||||
- **Page count validation**: Cross-checks against /Count; emits STRUCT_INVALID_PAGE_COUNT on mismatch
|
||||
|
||||
## Acceptance Criteria Status
|
||||
|
||||
| Criterion | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| 3-level /Pages inheritance | PASS | `test_flatten_three_level_inheritance` verifies grandparent MediaBox inheritance |
|
||||
| EC-09: missing MediaBox defaults | PASS | `test_ec09_missing_mediabox_defaults_to_us_letter` |
|
||||
| /Pages tree with cycles | PASS | `test_cycle_detection_in_page_tree` |
|
||||
| /Rotate = 45 clamped to 0 | PASS | `test_invalid_rotate_clamped` |
|
||||
| Page count validation | PASS | `test_page_count_mismatch` |
|
||||
| proptest: random shapes never panic | PASS | All fuzz tests in proptests module |
|
||||
| INV-8: no panics on invalid input | PASS | Proptests cover arbitrary PdfObject input |
|
||||
|
||||
## Files Modified
|
||||
|
||||
- `crates/pdftract-core/src/parser/pages.rs` - Added cycle detection test
|
||||
|
||||
## Tests
|
||||
|
||||
All 189 lib tests pass:
|
||||
- 17 page-specific unit tests
|
||||
- 4 property tests (fuzzing)
|
||||
- All other modules unaffected
|
||||
Loading…
Add table
Reference in a new issue