pdftract/notes/pdftract-3o9fu.md
jedarden 3ac47215cf fix(pdftract-3o9fu): fix bead chain walker tests and skip logic
- Fixed discover tests: cache /Threads array directly, not wrapped in dict
- Fixed walk_beads tests: added termination/cycle checks when skipping beads
- Added check_and_handle_termination helper to prevent infinite loops
- Changed invalid /R and /P diagnostic codes to StructMissingKey (non-fatal)
- Fixed UTF-16BE test bytes for "日本語"

All 28 threads module tests now pass.

Closes: pdftract-3o9fu
2026-05-25 09:02:42 -04:00

5.7 KiB

pdftract-3o9fu: 7.7.2 Bead chain walker with cycle detection + page/rect resolution

Summary

Implemented the walk_beads function in crates/pdftract-core/src/threads/mod.rs to walk PDF article thread bead chains with cycle detection and page/rect resolution.

Changes Made

Fixed Tests (7 tests)

All failing tests were fixed to pass:

  1. discover tests (5 tests): Fixed test setup to cache /Threads array directly at the catalog's threads_ref, not wrapped in a dictionary with a "Threads" key.

    • test_discover_thread_no_info_dict
    • test_discover_thread_missing_f_skipped
    • test_discover_thread_empty_title
    • test_discover_thread_utf16_title
    • test_discover_three_threads
  2. walk_beads tests (2 tests): Fixed infinite loop when beads are skipped by adding termination and cycle checks after updating current_ref.

    • test_walk_beads_invalid_rect_shape
    • test_walk_beads_page_ref_not_in_tree

Code Changes

  1. check_and_handle_termination helper function: Added to check for termination (next points back to first) and malformed cycles (bead revisited). Returns false to terminate the walk, true to continue.

  2. Fixed bead skip logic: When a bead is skipped (invalid page ref, missing rect, etc.), the code now:

    • Gets the next bead ref
    • Checks for termination and malformed cycles
    • Updates current_ref only if continuing
    • This prevents infinite loops when /N points back to first
  3. Changed diagnostic codes: Changed invalid /R and /P cases from StructUnexpectedEof to StructMissingKey to treat them as non-fatal (bead is skipped, walk continues).

  4. Fixed UTF-16 test bytes: Corrected the UTF-16BE bytes for "日本語" in test_discover_thread_utf16_title.

Acceptance Criteria

Critical tests (from plan)

  • PASS: PDF with two article threads: both reconstructed with correct bead order and page references (test_walk_beads_two_threads)
  • PASS: Thread with no /I info dict: title, author, subject all null; bead chain still reconstructed (test_discover_thread_no_info_dict)
  • PASS: Bead /V rect correctly converted to PDF user-space coordinates for the referenced page (test_walk_beads_single_bead)
  • PASS: Circular bead chain termination: chain walk stops after visiting all beads without infinite loop (test_walk_beads_circular_termination)

Unit tests

  • PASS: Pathological cycle (diagnostic) (test_walk_beads_malformed_cycle)
  • PASS: Missing /N (terminates chain) (test_walk_beads_missing_next)
  • PASS: Missing /P (skip bead) (test_walk_beads_missing_page_ref)
  • PASS: /Pg fallback (test_walk_beads_pg_fallback)
  • PASS: Bead with invalid rect shape skips bead (test_walk_beads_invalid_rect_shape)
  • PASS: Page ref outside document range skips bead (test_walk_beads_page_ref_not_in_tree)
  • PASS: Maximum iteration cap enforced (test_walk_beads_max_iterations)

Public API

  • PASS: threads::walk_beads(ThreadHeader, &XrefResolver, &HashMap<ObjRef, usize>) -> Vec<Bead> is public and documented

Test Results

All 28 threads module tests pass:

running 28 tests
test threads::tests::test_bead_new ... ok
test threads::tests::test_decode_pdf_string_empty ... ok
test threads::tests::test_decode_pdf_string_latin1 ... ok
test threads::tests::test_decode_pdf_string_ascii ... ok
test threads::tests::test_decode_pdf_string_utf16be_bom ... ok
test threads::tests::test_decode_pdfdocencoding_ascii ... ok
test threads::tests::test_decode_pdfdocencoding_empty ... ok
test threads::tests::test_decode_utf16be_invalid_length ... ok
test threads::tests::test_discover_no_threads_field ... ok
test threads::tests::test_discover_empty_threads ... ok
test threads::tests::test_discover_thread_empty_title ... ok
test threads::tests::test_discover_thread_missing_f_skipped ... ok
test threads::tests::test_discover_three_threads ... ok
test threads::tests::test_discover_thread_no_info_dict ... ok
test threads::tests::test_walk_beads_circular_termination ... ok
test threads::tests::test_thread_header_new ... ok
test threads::tests::test_walk_beads_invalid_rect_shape ... ok
test threads::tests::test_thread_header_with_fields ... ok
test threads::tests::test_discover_thread_utf16_title ... ok
test threads::tests::test_walk_beads_malformed_cycle ... ok
test threads::tests::test_walk_beads_missing_page_ref ... ok
test threads::tests::test_walk_beads_missing_rect ... ok
test threads::tests::test_walk_beads_missing_next ... ok
test threads::tests::test_walk_beads_page_ref_not_in_tree ... ok
test threads::tests::test_walk_beads_pg_fallback ... ok
test threads::tests::test_walk_beads_single_bead ... ok
test threads::tests::test_walk_beads_two_threads ... ok
test threads::tests::test_walk_beads_max_iterations ... ok

test result: ok. 28 passed; 0 failed; 0 ignored; 0 measured; 1916 filtered out

Code Quality

  • cargo check --all-targets passes
  • cargo fmt applied (no formatting changes needed)
  • All public functions documented with rustdoc
  • No unwrap() or expect() in non-test code
  • Exhaustive match arms on enums

Files Modified

  • crates/pdftract-core/src/threads/mod.rs: Fixed tests, added check_and_handle_termination helper, fixed bead skip logic

Commit Message

fix(pdftract-3o9fu): fix bead chain walker tests and skip logic

- Fixed discover tests: cache /Threads array directly, not wrapped in dict
- Fixed walk_beads tests: added termination/cycle checks when skipping beads
- Added check_and_handle_termination helper to prevent infinite loops
- Changed invalid /R and /P diagnostic codes to StructMissingKey (non-fatal)
- Fixed UTF-16BE test bytes for "日本語"

All 28 threads module tests now pass.

Closes: pdftract-3o9fu