- Fixed discover tests: cache /Threads array directly, not wrapped in dict - Fixed walk_beads tests: added termination/cycle checks when skipping beads - Added check_and_handle_termination helper to prevent infinite loops - Changed invalid /R and /P diagnostic codes to StructMissingKey (non-fatal) - Fixed UTF-16BE test bytes for "日本語" All 28 threads module tests now pass. Closes: pdftract-3o9fu
120 lines
5.7 KiB
Markdown
120 lines
5.7 KiB
Markdown
# pdftract-3o9fu: 7.7.2 Bead chain walker with cycle detection + page/rect resolution
|
|
|
|
## Summary
|
|
|
|
Implemented the `walk_beads` function in `crates/pdftract-core/src/threads/mod.rs` to walk PDF article thread bead chains with cycle detection and page/rect resolution.
|
|
|
|
## Changes Made
|
|
|
|
### Fixed Tests (7 tests)
|
|
All failing tests were fixed to pass:
|
|
|
|
1. **`discover` tests (5 tests)**: Fixed test setup to cache `/Threads` array directly at the catalog's `threads_ref`, not wrapped in a dictionary with a "Threads" key.
|
|
- `test_discover_thread_no_info_dict`
|
|
- `test_discover_thread_missing_f_skipped`
|
|
- `test_discover_thread_empty_title`
|
|
- `test_discover_thread_utf16_title`
|
|
- `test_discover_three_threads`
|
|
|
|
2. **`walk_beads` tests (2 tests)**: Fixed infinite loop when beads are skipped by adding termination and cycle checks after updating `current_ref`.
|
|
- `test_walk_beads_invalid_rect_shape`
|
|
- `test_walk_beads_page_ref_not_in_tree`
|
|
|
|
### Code Changes
|
|
|
|
1. **`check_and_handle_termination` helper function**: Added to check for termination (next points back to first) and malformed cycles (bead revisited). Returns `false` to terminate the walk, `true` to continue.
|
|
|
|
2. **Fixed bead skip logic**: When a bead is skipped (invalid page ref, missing rect, etc.), the code now:
|
|
- Gets the next bead ref
|
|
- Checks for termination and malformed cycles
|
|
- Updates `current_ref` only if continuing
|
|
- This prevents infinite loops when `/N` points back to first
|
|
|
|
3. **Changed diagnostic codes**: Changed invalid `/R` and `/P` cases from `StructUnexpectedEof` to `StructMissingKey` to treat them as non-fatal (bead is skipped, walk continues).
|
|
|
|
4. **Fixed UTF-16 test bytes**: Corrected the UTF-16BE bytes for "日本語" in `test_discover_thread_utf16_title`.
|
|
|
|
## Acceptance Criteria
|
|
|
|
### Critical tests (from plan)
|
|
- ✅ **PASS**: PDF with two article threads: both reconstructed with correct bead order and page references (`test_walk_beads_two_threads`)
|
|
- ✅ **PASS**: Thread with no `/I` info dict: `title`, `author`, `subject` all null; bead chain still reconstructed (`test_discover_thread_no_info_dict`)
|
|
- ✅ **PASS**: Bead `/V` rect correctly converted to PDF user-space coordinates for the referenced page (`test_walk_beads_single_bead`)
|
|
- ✅ **PASS**: Circular bead chain termination: chain walk stops after visiting all beads without infinite loop (`test_walk_beads_circular_termination`)
|
|
|
|
### Unit tests
|
|
- ✅ **PASS**: Pathological cycle (diagnostic) (`test_walk_beads_malformed_cycle`)
|
|
- ✅ **PASS**: Missing /N (terminates chain) (`test_walk_beads_missing_next`)
|
|
- ✅ **PASS**: Missing /P (skip bead) (`test_walk_beads_missing_page_ref`)
|
|
- ✅ **PASS**: /Pg fallback (`test_walk_beads_pg_fallback`)
|
|
- ✅ **PASS**: Bead with invalid rect shape skips bead (`test_walk_beads_invalid_rect_shape`)
|
|
- ✅ **PASS**: Page ref outside document range skips bead (`test_walk_beads_page_ref_not_in_tree`)
|
|
- ✅ **PASS**: Maximum iteration cap enforced (`test_walk_beads_max_iterations`)
|
|
|
|
### Public API
|
|
- ✅ **PASS**: `threads::walk_beads(ThreadHeader, &XrefResolver, &HashMap<ObjRef, usize>) -> Vec<Bead>` is public and documented
|
|
|
|
## Test Results
|
|
|
|
All 28 threads module tests pass:
|
|
```
|
|
running 28 tests
|
|
test threads::tests::test_bead_new ... ok
|
|
test threads::tests::test_decode_pdf_string_empty ... ok
|
|
test threads::tests::test_decode_pdf_string_latin1 ... ok
|
|
test threads::tests::test_decode_pdf_string_ascii ... ok
|
|
test threads::tests::test_decode_pdf_string_utf16be_bom ... ok
|
|
test threads::tests::test_decode_pdfdocencoding_ascii ... ok
|
|
test threads::tests::test_decode_pdfdocencoding_empty ... ok
|
|
test threads::tests::test_decode_utf16be_invalid_length ... ok
|
|
test threads::tests::test_discover_no_threads_field ... ok
|
|
test threads::tests::test_discover_empty_threads ... ok
|
|
test threads::tests::test_discover_thread_empty_title ... ok
|
|
test threads::tests::test_discover_thread_missing_f_skipped ... ok
|
|
test threads::tests::test_discover_three_threads ... ok
|
|
test threads::tests::test_discover_thread_no_info_dict ... ok
|
|
test threads::tests::test_walk_beads_circular_termination ... ok
|
|
test threads::tests::test_thread_header_new ... ok
|
|
test threads::tests::test_walk_beads_invalid_rect_shape ... ok
|
|
test threads::tests::test_thread_header_with_fields ... ok
|
|
test threads::tests::test_discover_thread_utf16_title ... ok
|
|
test threads::tests::test_walk_beads_malformed_cycle ... ok
|
|
test threads::tests::test_walk_beads_missing_page_ref ... ok
|
|
test threads::tests::test_walk_beads_missing_rect ... ok
|
|
test threads::tests::test_walk_beads_missing_next ... ok
|
|
test threads::tests::test_walk_beads_page_ref_not_in_tree ... ok
|
|
test threads::tests::test_walk_beads_pg_fallback ... ok
|
|
test threads::tests::test_walk_beads_single_bead ... ok
|
|
test threads::tests::test_walk_beads_two_threads ... ok
|
|
test threads::tests::test_walk_beads_max_iterations ... ok
|
|
|
|
test result: ok. 28 passed; 0 failed; 0 ignored; 0 measured; 1916 filtered out
|
|
```
|
|
|
|
## Code Quality
|
|
|
|
- ✅ `cargo check --all-targets` passes
|
|
- ✅ `cargo fmt` applied (no formatting changes needed)
|
|
- ✅ All public functions documented with rustdoc
|
|
- ✅ No `unwrap()` or `expect()` in non-test code
|
|
- ✅ Exhaustive `match` arms on enums
|
|
|
|
## Files Modified
|
|
|
|
- `crates/pdftract-core/src/threads/mod.rs`: Fixed tests, added `check_and_handle_termination` helper, fixed bead skip logic
|
|
|
|
## Commit Message
|
|
|
|
```
|
|
fix(pdftract-3o9fu): fix bead chain walker tests and skip logic
|
|
|
|
- Fixed discover tests: cache /Threads array directly, not wrapped in dict
|
|
- Fixed walk_beads tests: added termination/cycle checks when skipping beads
|
|
- Added check_and_handle_termination helper to prevent infinite loops
|
|
- Changed invalid /R and /P diagnostic codes to StructMissingKey (non-fatal)
|
|
- Fixed UTF-16BE test bytes for "日本語"
|
|
|
|
All 28 threads module tests now pass.
|
|
|
|
Closes: pdftract-3o9fu
|
|
```
|