Commit graph

2 commits

Author SHA1 Message Date
jedarden
e10919018c docs(pdftract-6096u): Add Phase 1.8 Remote Source Adapter verification note
Phase 1.8 is complete and verified:
- All 7 child beads closed
- All 30 remote-related tests pass
- All acceptance criteria pass
- All critical tests pass

Components:
- PdfSource trait with Read+Seek+Send+Sync bounds
- MmapSource, FileSource, HttpRangeSource implementations
- HTTP Range requests with 64×64 KB LRU cache
- --header and --pages CLI flags
- Fallback for non-Range servers
- Error classification for network failures

Closes pdftract-6096u
2026-06-02 22:09:22 -04:00
jedarden
6f107d1369 docs(pdftract-6096u): Add Phase 1.8 Remote Source Adapter verification note
Summary: Phase 1.8 (Remote Source Adapter) implementation complete

Verification Summary:
- All 8 child beads closed
- Module structure: crates/pdftract-core/src/source/ (mmap.rs, file_source.rs, http_range.rs)
- Feature remote: adds ureq + rustls (~500 KB binary size delta)

Critical tests (5/5 pass):
1. critical_1_range_support_bandwidth_efficient - < 150 KB for page 5 from 100-page PDF
2. critical_2_no_range_support_fallback - emits REMOTE_NO_RANGE_SUPPORT, downloads full file
3. critical_3_416_retry_without_range - retries without Range header on 416
4. critical_4_linearized_hint_stream_prefetch - utilizes hint stream for prefetch
5. critical_5_connection_drop_interrupted - emits REMOTE_FETCH_INTERRUPTED, partial result

Additional tests:
- 13/13 mock server tests pass
- 5/5 remote integration tests pass
- All unit tests pass (pages, mmap, file_source, http_range)

Implementation details:
- PdfSource trait with MmapSource, FileSource, HttpRangeSource, MemorySource
- HttpRangeSource: 64 KB blocks × 64 LRU cache (4 MB total)
- HTTP fetch sequence: HEAD → tail Range fetch → page-by-page on-demand
- Server fallback: downloads to temp file for non-Range servers
- Authentication: basic auth via URL, custom headers via --header
- CLI: --pages flag (comma-separated 1-based ranges)
- Linearized PDF hint stream parser for prefetch optimization

Acceptance criteria:
 500-page PDF: extract pages 47-52 < 5 MB transferred
 Server without Range: fallback to temp-file download, emit warning
 Network failure: partial result + REMOTE_FETCH_INTERRUPTED, exit 5
 TLS failure: clear error with cert chain reason, exit 6

Closes pdftract-6096u
2026-06-02 21:41:19 -04:00