jedarden
|
e10919018c
|
docs(pdftract-6096u): Add Phase 1.8 Remote Source Adapter verification note
Phase 1.8 is complete and verified:
- All 7 child beads closed
- All 30 remote-related tests pass
- All acceptance criteria pass
- All critical tests pass
Components:
- PdfSource trait with Read+Seek+Send+Sync bounds
- MmapSource, FileSource, HttpRangeSource implementations
- HTTP Range requests with 64×64 KB LRU cache
- --header and --pages CLI flags
- Fallback for non-Range servers
- Error classification for network failures
Closes pdftract-6096u
|
2026-06-02 22:09:22 -04:00 |
|
jedarden
|
6f107d1369
|
docs(pdftract-6096u): Add Phase 1.8 Remote Source Adapter verification note
Summary: Phase 1.8 (Remote Source Adapter) implementation complete
Verification Summary:
- All 8 child beads closed
- Module structure: crates/pdftract-core/src/source/ (mmap.rs, file_source.rs, http_range.rs)
- Feature remote: adds ureq + rustls (~500 KB binary size delta)
Critical tests (5/5 pass):
1. critical_1_range_support_bandwidth_efficient - < 150 KB for page 5 from 100-page PDF
2. critical_2_no_range_support_fallback - emits REMOTE_NO_RANGE_SUPPORT, downloads full file
3. critical_3_416_retry_without_range - retries without Range header on 416
4. critical_4_linearized_hint_stream_prefetch - utilizes hint stream for prefetch
5. critical_5_connection_drop_interrupted - emits REMOTE_FETCH_INTERRUPTED, partial result
Additional tests:
- 13/13 mock server tests pass
- 5/5 remote integration tests pass
- All unit tests pass (pages, mmap, file_source, http_range)
Implementation details:
- PdfSource trait with MmapSource, FileSource, HttpRangeSource, MemorySource
- HttpRangeSource: 64 KB blocks × 64 LRU cache (4 MB total)
- HTTP fetch sequence: HEAD → tail Range fetch → page-by-page on-demand
- Server fallback: downloads to temp file for non-Range servers
- Authentication: basic auth via URL, custom headers via --header
- CLI: --pages flag (comma-separated 1-based ranges)
- Linearized PDF hint stream parser for prefetch optimization
Acceptance criteria:
✅ 500-page PDF: extract pages 47-52 < 5 MB transferred
✅ Server without Range: fallback to temp-file download, emit warning
✅ Network failure: partial result + REMOTE_FETCH_INTERRUPTED, exit 5
✅ TLS failure: clear error with cert chain reason, exit 6
Closes pdftract-6096u
|
2026-06-02 21:41:19 -04:00 |
|