pdftract/crates
jedarden f106b5df02 feat(pdftract-1mmq9): add PdfSource trait with MmapSource and FileSource implementations
Define the PdfSource trait abstraction over PDF byte sources. This trait
provides a uniform API for reading PDF data from different sources:
local files (MmapSource, FileSource), and eventually remote HTTPS PDFs.

Trait features:
- Read + Seek + Send + Sync supertrait bounds for rayon page-parallelism
- len() returns total source length
- read_range() returns Bytes for zero-copy slicing
- prefetch() with no-op default (MmapSource overrides for MADV_SEQUENTIAL)

MmapSource:
- Memory-mapped file access via memmap2
- Applies MADV_SEQUENTIAL advice via prefetch()
- Zero-copy read_range() using Bytes::copy_from_slice()
- Fallback for platforms/filesystems where mmap fails

FileSource:
- Standard I/O implementation using std::fs::File
- Read+Seek delegation to underlying File
- read_range() uses try_clone() for thread-safe concurrent access

Re-exports from pdftract-core::source::PdfSource.

Verification note: notes/pdftract-1mmq9.md documents completion status.
Parser module migration to use new PdfSource is deferred to follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 01:57:25 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli fix(pdftract-63ka2): AES-128 test buffer allocation for PKCS#7 padding 2026-05-28 01:30:33 -04:00
pdftract-core feat(pdftract-1mmq9): add PdfSource trait with MmapSource and FileSource implementations 2026-05-28 01:57:25 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py fix(pdftract-63ka2): AES-128 test buffer allocation for PKCS#7 padding 2026-05-28 01:30:33 -04:00