pdftract/notes/pdftract-5tmcg.md
jedarden f76f3a647b test(pdftract-5tmcg): add cycle detection test for page tree flattener
Add test_cycle_detection_in_page_tree to verify that circular references
in the /Pages tree are detected and handled gracefully without panicking.
The test creates a page tree with a cycle (parent -> child1 -> child2 -> child1)
and verifies that the flattener returns the valid pages while pruning the
cyclic portion.

Acceptance criteria verified:
- 3-level /Pages inheritance with MediaBox: PASS
- EC-09 missing MediaBox defaults to US Letter: PASS
- /Pages tree with cycles detected: PASS
- /Rotate value 45 clamped to 0: PASS
- Page count validation: PASS
- proptest random shapes never panic: PASS
- INV-8 no panics on invalid input: PASS

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: pdftract-5tmcg
Bead-Id: pdftract-4iier
2026-05-18 00:38:44 -04:00

45 lines
2 KiB
Markdown

# pdftract-5tmcg: Page Tree Flattener with Inherited Attributes
## Summary
Implemented page tree flattener with inherited attribute resolution (MediaBox, CropBox, Resources, Rotate) plus content stream concatenation preparation.
## Implementation
The `flatten_page_tree` function in `crates/pdftract-core/src/parser/pages.rs` implements:
1. **Recursive page tree walk** with depth-first traversal
2. **Inherited attribute accumulator** tracking MediaBox, CropBox, Resources, Rotate across /Pages ancestors
3. **PageDict output** containing all resolved page attributes
4. **Error recovery** for malformed files
### Key Features
- **Cycle detection**: Uses HashSet<ObjRef> to detect circular references in /Kids arrays
- **Depth limiting**: MAX_PAGES_DEPTH = 16 to prevent stack overflow
- **EC-09 compliance**: Missing MediaBox defaults to US Letter (612 x 792 points) with STRUCT_MISSING_KEY diagnostic
- **Rotate validation**: Non-multiples of 90 are clamped to nearest multiple with STRUCT_INVALID_ROTATE diagnostic
- **Page count validation**: Cross-checks against /Count; emits STRUCT_INVALID_PAGE_COUNT on mismatch
## Acceptance Criteria Status
| Criterion | Status | Notes |
|-----------|--------|-------|
| 3-level /Pages inheritance | PASS | `test_flatten_three_level_inheritance` verifies grandparent MediaBox inheritance |
| EC-09: missing MediaBox defaults | PASS | `test_ec09_missing_mediabox_defaults_to_us_letter` |
| /Pages tree with cycles | PASS | `test_cycle_detection_in_page_tree` |
| /Rotate = 45 clamped to 0 | PASS | `test_invalid_rotate_clamped` |
| Page count validation | PASS | `test_page_count_mismatch` |
| proptest: random shapes never panic | PASS | All fuzz tests in proptests module |
| INV-8: no panics on invalid input | PASS | Proptests cover arbitrary PdfObject input |
## Files Modified
- `crates/pdftract-core/src/parser/pages.rs` - Added cycle detection test
## Tests
All 189 lib tests pass:
- 17 page-specific unit tests
- 4 property tests (fuzzing)
- All other modules unaffected