pdftract/notes/pdftract-5tmcg.md
jedarden f76f3a647b test(pdftract-5tmcg): add cycle detection test for page tree flattener
Add test_cycle_detection_in_page_tree to verify that circular references
in the /Pages tree are detected and handled gracefully without panicking.
The test creates a page tree with a cycle (parent -> child1 -> child2 -> child1)
and verifies that the flattener returns the valid pages while pruning the
cyclic portion.

Acceptance criteria verified:
- 3-level /Pages inheritance with MediaBox: PASS
- EC-09 missing MediaBox defaults to US Letter: PASS
- /Pages tree with cycles detected: PASS
- /Rotate value 45 clamped to 0: PASS
- Page count validation: PASS
- proptest random shapes never panic: PASS
- INV-8 no panics on invalid input: PASS

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: pdftract-5tmcg
Bead-Id: pdftract-4iier
2026-05-18 00:38:44 -04:00

2 KiB

pdftract-5tmcg: Page Tree Flattener with Inherited Attributes

Summary

Implemented page tree flattener with inherited attribute resolution (MediaBox, CropBox, Resources, Rotate) plus content stream concatenation preparation.

Implementation

The flatten_page_tree function in crates/pdftract-core/src/parser/pages.rs implements:

  1. Recursive page tree walk with depth-first traversal
  2. Inherited attribute accumulator tracking MediaBox, CropBox, Resources, Rotate across /Pages ancestors
  3. PageDict output containing all resolved page attributes
  4. Error recovery for malformed files

Key Features

  • Cycle detection: Uses HashSet to detect circular references in /Kids arrays
  • Depth limiting: MAX_PAGES_DEPTH = 16 to prevent stack overflow
  • EC-09 compliance: Missing MediaBox defaults to US Letter (612 x 792 points) with STRUCT_MISSING_KEY diagnostic
  • Rotate validation: Non-multiples of 90 are clamped to nearest multiple with STRUCT_INVALID_ROTATE diagnostic
  • Page count validation: Cross-checks against /Count; emits STRUCT_INVALID_PAGE_COUNT on mismatch

Acceptance Criteria Status

Criterion Status Notes
3-level /Pages inheritance PASS test_flatten_three_level_inheritance verifies grandparent MediaBox inheritance
EC-09: missing MediaBox defaults PASS test_ec09_missing_mediabox_defaults_to_us_letter
/Pages tree with cycles PASS test_cycle_detection_in_page_tree
/Rotate = 45 clamped to 0 PASS test_invalid_rotate_clamped
Page count validation PASS test_page_count_mismatch
proptest: random shapes never panic PASS All fuzz tests in proptests module
INV-8: no panics on invalid input PASS Proptests cover arbitrary PdfObject input

Files Modified

  • crates/pdftract-core/src/parser/pages.rs - Added cycle detection test

Tests

All 189 lib tests pass:

  • 17 page-specific unit tests
  • 4 property tests (fuzzing)
  • All other modules unaffected