pdftract/crates/pdftract-py
jedarden 9abc386cce feat(pdftract-3h9xo): implement threads JSON output + schema integration
Phase 7.7.3: Add threads field to ExtractionResult with ThreadJson schema integration.

Changes:
- Added ThreadJson and BeadJson structs to schema/mod.rs
- Added thread_to_json() function to threads/mod.rs
- Added build_page_ref_to_index() helper to parser/pages.rs
- Added threads field to ExtractionResult in extract.rs
- Implemented Phase 7.7 extraction logic with discover_threads/walk_beads
- Added threads_to_markdown() and collapse_page_ranges() to markdown.rs
- Updated JSON schema with ThreadJson and BeadJson definitions
- Added thread_to_py() and bead_to_py() conversions in pdftract-py
- Exported ThreadJson, BeadJson from lib.rs

All 32 threads module tests pass. All 35 markdown tests pass.

Verification: notes/pdftract-3h9xo.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:40:15 -04:00
..
python/pdftract feat(pdftract-2nu0s): implement Python SDK contract conformance 2026-05-24 08:55:11 -04:00
src feat(pdftract-3h9xo): implement threads JSON output + schema integration 2026-05-25 13:40:15 -04:00
tests feat(pdftract-2nu0s): implement Python SDK contract conformance 2026-05-24 08:55:11 -04:00
Cargo.toml feat(pdftract-3j2u): implement 50 MB size limit + base64 encoding for attachments 2026-05-25 11:42:28 -04:00
pdftract-py.cdx.json feat(pdftract-67tm8): implement MCP stdio transport with integration tests 2026-05-23 00:16:42 -04:00
pyproject.toml feat(pdftract-2nu0s): implement Python SDK contract conformance 2026-05-24 08:55:11 -04:00