jedarden/pdftract

Fork 0

Commit graph

Author	SHA1	Message	Date
jedarden	9d662aec25	feat(pdftract-bnba5): implement PyO3 extract_stream entry point with StreamIterator Add callback-based streaming API to pdftract-core and PyO3 bindings that return a Python iterator yielding page dicts incrementally. This provides memory-efficient extraction for large PDFs via the iterator protocol. Core changes: - Add extract_pdf_streaming() callback-based function to pdftract-core - Export extract_pdf_streaming in lib.rs PyO3 bindings: - Add StreamIterator PyClass with __iter__/__next__ methods - Add extract_stream_fn() spawning background thread with mpsc channel - Add *Frame types for efficient Python dict serialization - Integrate into pdftract Python module Closes: pdftract-bnba5	2026-05-24 07:35:03 -04:00

Author

SHA1

Message

Date

jedarden

9d662aec25

feat(pdftract-bnba5): implement PyO3 extract_stream entry point with StreamIterator

Add callback-based streaming API to pdftract-core and PyO3 bindings that
return a Python iterator yielding page dicts incrementally. This provides
memory-efficient extraction for large PDFs via the iterator protocol.

Core changes:
- Add extract_pdf_streaming() callback-based function to pdftract-core
- Export extract_pdf_streaming in lib.rs

PyO3 bindings:
- Add StreamIterator PyClass with __iter__/__next__ methods
- Add extract_stream_fn() spawning background thread with mpsc channel
- Add *Frame types for efficient Python dict serialization
- Integrate into pdftract Python module

Closes: pdftract-bnba5

2026-05-24 07:35:03 -04:00

1 commit