Phase 7.7.3: Add threads field to ExtractionResult with ThreadJson schema integration.
Changes:
- Added ThreadJson and BeadJson structs to schema/mod.rs
- Added thread_to_json() function to threads/mod.rs
- Added build_page_ref_to_index() helper to parser/pages.rs
- Added threads field to ExtractionResult in extract.rs
- Implemented Phase 7.7 extraction logic with discover_threads/walk_beads
- Added threads_to_markdown() and collapse_page_ranges() to markdown.rs
- Updated JSON schema with ThreadJson and BeadJson definitions
- Added thread_to_py() and bead_to_py() conversions in pdftract-py
- Exported ThreadJson, BeadJson from lib.rs
All 32 threads module tests pass. All 35 markdown tests pass.
Verification: notes/pdftract-3h9xo.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add Cargo bench target for grep performance measurement across 1000-PDF corpus.
Includes result structure, CI gate validation (50 MB/s), smart corpus path
resolution, and development-friendly empty-corpus handling.
Corpus infrastructure created at tests/fixtures/grep-corpus/ with regenerate
script, manifest template, and documentation. Benchmark ready to wire to
actual grep implementation once 7.8.3-7.8.8 sub-tasks complete.
Closes: pdftract-5bzpg
Files:
- crates/pdftract-cli/Cargo.toml: Add [[bench]] grep_1000 + chrono, criterion deps
- crates/pdftract-cli/benches/grep_1000.rs: Benchmark implementation (280 lines)
- tests/fixtures/grep-corpus/: Corpus infrastructure (regenerate.sh, manifest, README)
- notes/pdftract-5bzpg.md: Verification note with acceptance criteria status
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>