pdftract/crates
jedarden 9d662aec25 feat(pdftract-bnba5): implement PyO3 extract_stream entry point with StreamIterator
Add callback-based streaming API to pdftract-core and PyO3 bindings that
return a Python iterator yielding page dicts incrementally. This provides
memory-efficient extraction for large PDFs via the iterator protocol.

Core changes:
- Add extract_pdf_streaming() callback-based function to pdftract-core
- Export extract_pdf_streaming in lib.rs

PyO3 bindings:
- Add StreamIterator PyClass with __iter__/__next__ methods
- Add extract_stream_fn() spawning background thread with mpsc channel
- Add *Frame types for efficient Python dict serialization
- Integrate into pdftract Python module

Closes: pdftract-bnba5
2026-05-24 07:35:03 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-dtpwa): implement contract profile per Phase 7.10 schema 2026-05-24 07:10:32 -04:00
pdftract-core feat(pdftract-bnba5): implement PyO3 extract_stream entry point with StreamIterator 2026-05-24 07:35:03 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py feat(pdftract-bnba5): implement PyO3 extract_stream entry point with StreamIterator 2026-05-24 07:35:03 -04:00