- Fixed discover tests: cache /Threads array directly, not wrapped in dict - Fixed walk_beads tests: added termination/cycle checks when skipping beads - Added check_and_handle_termination helper to prevent infinite loops - Changed invalid /R and /P diagnostic codes to StructMissingKey (non-fatal) - Fixed UTF-16BE test bytes for "日本語" All 28 threads module tests now pass. Closes: pdftract-3o9fu
5.7 KiB
pdftract-3o9fu: 7.7.2 Bead chain walker with cycle detection + page/rect resolution
Summary
Implemented the walk_beads function in crates/pdftract-core/src/threads/mod.rs to walk PDF article thread bead chains with cycle detection and page/rect resolution.
Changes Made
Fixed Tests (7 tests)
All failing tests were fixed to pass:
-
discovertests (5 tests): Fixed test setup to cache/Threadsarray directly at the catalog'sthreads_ref, not wrapped in a dictionary with a "Threads" key.test_discover_thread_no_info_dicttest_discover_thread_missing_f_skippedtest_discover_thread_empty_titletest_discover_thread_utf16_titletest_discover_three_threads
-
walk_beadstests (2 tests): Fixed infinite loop when beads are skipped by adding termination and cycle checks after updatingcurrent_ref.test_walk_beads_invalid_rect_shapetest_walk_beads_page_ref_not_in_tree
Code Changes
-
check_and_handle_terminationhelper function: Added to check for termination (next points back to first) and malformed cycles (bead revisited). Returnsfalseto terminate the walk,trueto continue. -
Fixed bead skip logic: When a bead is skipped (invalid page ref, missing rect, etc.), the code now:
- Gets the next bead ref
- Checks for termination and malformed cycles
- Updates
current_refonly if continuing - This prevents infinite loops when
/Npoints back to first
-
Changed diagnostic codes: Changed invalid
/Rand/Pcases fromStructUnexpectedEoftoStructMissingKeyto treat them as non-fatal (bead is skipped, walk continues). -
Fixed UTF-16 test bytes: Corrected the UTF-16BE bytes for "日本語" in
test_discover_thread_utf16_title.
Acceptance Criteria
Critical tests (from plan)
- ✅ PASS: PDF with two article threads: both reconstructed with correct bead order and page references (
test_walk_beads_two_threads) - ✅ PASS: Thread with no
/Iinfo dict:title,author,subjectall null; bead chain still reconstructed (test_discover_thread_no_info_dict) - ✅ PASS: Bead
/Vrect correctly converted to PDF user-space coordinates for the referenced page (test_walk_beads_single_bead) - ✅ PASS: Circular bead chain termination: chain walk stops after visiting all beads without infinite loop (
test_walk_beads_circular_termination)
Unit tests
- ✅ PASS: Pathological cycle (diagnostic) (
test_walk_beads_malformed_cycle) - ✅ PASS: Missing /N (terminates chain) (
test_walk_beads_missing_next) - ✅ PASS: Missing /P (skip bead) (
test_walk_beads_missing_page_ref) - ✅ PASS: /Pg fallback (
test_walk_beads_pg_fallback) - ✅ PASS: Bead with invalid rect shape skips bead (
test_walk_beads_invalid_rect_shape) - ✅ PASS: Page ref outside document range skips bead (
test_walk_beads_page_ref_not_in_tree) - ✅ PASS: Maximum iteration cap enforced (
test_walk_beads_max_iterations)
Public API
- ✅ PASS:
threads::walk_beads(ThreadHeader, &XrefResolver, &HashMap<ObjRef, usize>) -> Vec<Bead>is public and documented
Test Results
All 28 threads module tests pass:
running 28 tests
test threads::tests::test_bead_new ... ok
test threads::tests::test_decode_pdf_string_empty ... ok
test threads::tests::test_decode_pdf_string_latin1 ... ok
test threads::tests::test_decode_pdf_string_ascii ... ok
test threads::tests::test_decode_pdf_string_utf16be_bom ... ok
test threads::tests::test_decode_pdfdocencoding_ascii ... ok
test threads::tests::test_decode_pdfdocencoding_empty ... ok
test threads::tests::test_decode_utf16be_invalid_length ... ok
test threads::tests::test_discover_no_threads_field ... ok
test threads::tests::test_discover_empty_threads ... ok
test threads::tests::test_discover_thread_empty_title ... ok
test threads::tests::test_discover_thread_missing_f_skipped ... ok
test threads::tests::test_discover_three_threads ... ok
test threads::tests::test_discover_thread_no_info_dict ... ok
test threads::tests::test_walk_beads_circular_termination ... ok
test threads::tests::test_thread_header_new ... ok
test threads::tests::test_walk_beads_invalid_rect_shape ... ok
test threads::tests::test_thread_header_with_fields ... ok
test threads::tests::test_discover_thread_utf16_title ... ok
test threads::tests::test_walk_beads_malformed_cycle ... ok
test threads::tests::test_walk_beads_missing_page_ref ... ok
test threads::tests::test_walk_beads_missing_rect ... ok
test threads::tests::test_walk_beads_missing_next ... ok
test threads::tests::test_walk_beads_page_ref_not_in_tree ... ok
test threads::tests::test_walk_beads_pg_fallback ... ok
test threads::tests::test_walk_beads_single_bead ... ok
test threads::tests::test_walk_beads_two_threads ... ok
test threads::tests::test_walk_beads_max_iterations ... ok
test result: ok. 28 passed; 0 failed; 0 ignored; 0 measured; 1916 filtered out
Code Quality
- ✅
cargo check --all-targetspasses - ✅
cargo fmtapplied (no formatting changes needed) - ✅ All public functions documented with rustdoc
- ✅ No
unwrap()orexpect()in non-test code - ✅ Exhaustive
matcharms on enums
Files Modified
crates/pdftract-core/src/threads/mod.rs: Fixed tests, addedcheck_and_handle_terminationhelper, fixed bead skip logic
Commit Message
fix(pdftract-3o9fu): fix bead chain walker tests and skip logic
- Fixed discover tests: cache /Threads array directly, not wrapped in dict
- Fixed walk_beads tests: added termination/cycle checks when skipping beads
- Added check_and_handle_termination helper to prevent infinite loops
- Changed invalid /R and /P diagnostic codes to StructMissingKey (non-fatal)
- Fixed UTF-16BE test bytes for "日本語"
All 28 threads module tests now pass.
Closes: pdftract-3o9fu