jedarden
225f96c241
fix(pyo3): correct extract_text_fn call in extract_markdown stub
...
The extract_markdown stub was calling extract_text instead of
extract_text_fn, causing a compilation error. This fixes the
function name to match the exported function from extract_text.rs.
This completes the extract_text PyO3 entry point implementation,
which was already present in extract_text.rs and lib.rs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 20:28:25 -04:00
jedarden
f85e5149dd
feat(pdftract-91e1i): HTTP fetch sequence implementation
...
Implement orchestration layer connecting HttpRangeSource to Phase 1.3
xref resolver and Phase 1.4 document model for remote PDF access:
- Document::open_remote() public API for remote PDF loading
- Progressive tail fetch (16 KB → 1 MB) for startxref location
- Xref forward-scan disabled for remote sources (via is_remote check)
- Page-by-page on-demand fetch via HttpRangeSource caching
- Resource lazy load through XrefResolver cache
- HEAD probe with 405 fallback, no Content-Length handling
Acceptance criteria:
✅ open_remote(url) returns Document with correct page count
✅ HEAD failure modes (405, no Content-Length, 401) handled
✅ xref forward-scan disabled for remote (is_remote check)
✅ Page-by-page on-demand fetch (HttpRangeSource LRU cache)
✅ INV-8 maintained (all errors return Result)
Files modified:
- crates/pdftract-core/src/document.rs (Document::open_remote, from_source)
- crates/pdftract-core/src/remote.rs (progressive tail fetch)
- crates/pdftract-core/src/lib.rs (re-exports)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 13:17:00 -04:00