Fix two compilation errors at lines 584 and 658 where code was calling
.code on &String diagnostics. Replaced d.code.to_string() with direct
Vec<String> clone since diagnostics is already Vec<String>.
Accepts criteria:
- cargo check -p pdftract-cli emits no 'no field code' errors
- serve.rs compiles cleanly
- Add worked example to Glyph struct showing all 11 fields
- Add worked example to Span struct showing all 10 fields
- Examples use rust,no_run for internal dependencies
- cargo doc passes with docs.rs feature set
- Verification note added at notes/pdftract-3eohy.md
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The emit! macro expects diagnostic codes without the DiagCode:: prefix.
Changed three occurrences in codespace.rs:
- Line 281: DiagCode::CmapInvalidCodespace → CmapInvalidCodespace
- Line 290: DiagCode::CmapInvalidCodespace → CmapInvalidCodespace
- Line 412: DiagCode::CmapInvalidCodespace → CmapInvalidCodespace
This fixes compilation errors that prevented the codebase from building.
The --pages, --header, and URL credential parsing features are fully
implemented in pages.rs, header.rs, and url.rs modules with comprehensive
tests and integration in main.rs, grep/mod.rs, and hash.rs.
References: pdftract-25igv, notes/pdftract-25igv.md
Add call-site diagnostic emission for DCTDecode SOI/EOI marker validation.
Previously, DCTDecoder.validate_markers() created diagnostics but they were
dropped because StreamDecoder trait doesn't support returning them. Now
diagnostics are emitted in decode_stream_impl() like JBIG2/JPX/CCITT.
Also include source module refactoring:
- Add PdfSource adapter trait for source::PdfSource compatibility
- Feature-gate http_range module with `remote` feature
- Update document.rs to use new source traits
Acceptance criteria:
- DCTDecode emits STREAM_INVALID_JPEG for missing SOI/EOI markers
- JBIG2Decode emits OCR_JBIG2_UNSUPPORTED when full-render disabled
- JPXDecode emits OCR_JPX_UNSUPPORTED and validates JP2 magic
- CCITTFaxDecode emits OCR_CCITT_UNSUPPORTED when no libtiff
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: pdftract-4xmp6
Bead-Id: pdftract-57np8
Bead-Id: pdftract-3954u
- Remove unused jpx::JpxDecoder import from stream.rs (code uses fully qualified paths)
- Add notes/pdftract-36glh.md with acceptance criteria verification
The JPXDecode passthrough implementation was already complete in commit 4ba4687.
This change is minor cleanup only.
References: pdftract-36glh
- test_open_valid_file: byte string is 22 bytes, not 20
- test_seek_from_end: seeking -2 from end of "Hello" gives "lo", not "el"
The MmapSource implementation was already complete with all acceptance
criteria met:
- open() returns Ok/Err appropriately
- read_range() with bounds checking
- len() matches file size
- Read+Seek trait implementations
- Send + Sync for concurrent access
- MADV_SEQUENTIAL via advise_sequential()
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add std::sync::Arc import for thread sharing
- Fix lifetime issue in test_sync_multiple_threads using Arc
- Add mut to source in test_empty_file for Read trait
All FileSource tests pass (12/12).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implement FileSource as a PdfSource fallback for when memory-mapping
is not available or desired. Uses parking_lot::Mutex<File> for
thread-safe concurrent access across rayon workers.
Changes:
- Add parking_lot = "0.12" dependency to pdftract-core/Cargo.toml
- Rewrite FileSource to use Mutex<File> for Send + Sync support
- Implement PdfSource, Read, and Seek traits
- Add 12 comprehensive tests including concurrent read tests
All tests pass. Thread-safe concurrent access verified via
test_sync_multiple_threads and test_concurrent_read_range.
Co-Authored-By: Claude Code (claude-opus-4.7) <noreply@anthropic.com>
Bead-Id: pdftract-5ik66
Define the PdfSource trait abstraction over PDF byte sources. This trait
provides a uniform API for reading PDF data from different sources:
local files (MmapSource, FileSource), and eventually remote HTTPS PDFs.
Trait features:
- Read + Seek + Send + Sync supertrait bounds for rayon page-parallelism
- len() returns total source length
- read_range() returns Bytes for zero-copy slicing
- prefetch() with no-op default (MmapSource overrides for MADV_SEQUENTIAL)
MmapSource:
- Memory-mapped file access via memmap2
- Applies MADV_SEQUENTIAL advice via prefetch()
- Zero-copy read_range() using Bytes::copy_from_slice()
- Fallback for platforms/filesystems where mmap fails
FileSource:
- Standard I/O implementation using std::fs::File
- Read+Seek delegation to underlying File
- read_range() uses try_clone() for thread-safe concurrent access
Re-exports from pdftract-core::source::PdfSource.
Verification note: notes/pdftract-1mmq9.md documents completion status.
Parser module migration to use new PdfSource is deferred to follow-up.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>