- Add std::sync::Arc import for thread sharing - Fix lifetime issue in test_sync_multiple_threads using Arc - Add mut to source in test_empty_file for Read trait All FileSource tests pass (12/12). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 KiB
4 KiB
pdftract-1psmn: FileSource Implementation
Summary
Implemented FileSource as a PdfSource fallback for when memory-mapping is not available or desired. This provides standard I/O-based access to PDF files using Read+Seek.
Changes Made
1. crates/pdftract-core/src/source/file_source.rs
- Rewrote FileSource to use
parking_lot::Mutex<File>for thread-safe concurrent access - Implemented proper
Send + Synctraits via unsafe impl (backed by Mutex) - Implemented
PdfSourcetrait withlen()andread_range()methods - Implemented
ReadandSeektraits for standard I/O usage - Added comprehensive tests:
test_open_valid_file: Opens a valid filetest_open_nonexistent_file: Returns Err for non-existent filetest_read_range: Reads byte ranges correctlytest_read_seek: Tests Read+Seek trait methodstest_read_range_bounds: Tests boundary conditionstest_send_sync: Verifies Send trait (move to thread)test_sync_multiple_threads: Verifies Sync trait (concurrent reads from 4 threads)test_concurrent_read_range: Verifies concurrent reads all succeedtest_read_range_past_eof_returns_err: Tests EOF handlingtest_empty_file: Handles empty filestest_large_file: Handles 100KB filetest_read_mixed_with_seek: Tests mixed read/seek operations
2. crates/pdftract-core/src/source/mmap.rs (test fixes)
- Added
std::sync::Arcimport for thread sharing - Fixed lifetime issue in
test_sync_multiple_threadsby using Arc instead of&source - Added
mutto source intest_empty_filefor Read trait compatibility
3. crates/pdftract-core/Cargo.toml
- Added
parking_lot = "0.12"dependency
Acceptance Criteria
| Criterion | Status | Notes |
|---|---|---|
| FileSource::open(/path/to/file.pdf) returns Ok | PASS | test_open_valid_file |
| FileSource::open(/nonexistent) returns Err | PASS | test_open_nonexistent_file |
| read_range(0, 10) returns first 10 bytes | PASS | test_read_range |
| read_range past EOF returns Err | PASS | test_read_range_bounds |
| Send + Sync: FileSource can be sent across threads | PASS | test_send_sync, test_sync_multiple_threads |
| Concurrent read_range from 4 threads succeeds | PASS | test_concurrent_read_range |
| Test fixture for FUSE-mounted file | WARN | No FUSE fixture tested (environment limitation) |
Test Results
All 12 FileSource tests pass:
source::file_source::tests::test_open_valid_file PASS
source::file_source::tests::test_open_nonexistent_file PASS
source::file_source::tests::test_read_seek PASS
source::file_source::tests::test_send_sync PASS
source::file_source::tests::test_read_range_past_eof_ret.. PASS
source::file_source::tests::test_concurrent_read_range PASS
source::file_source::tests::test_read_range_bounds PASS
source::file_source::tests::test_empty_file PASS
source::file_source::tests::test_sync_multiple_threads PASS
source::file_source::tests::test_read_range PASS
source::file_source::tests::test_read_mixed_with_seek PASS
source::file_source::tests::test_large_file PASS
Key Implementation Details
-
Thread Safety: Uses
parking_lot::Mutex<File>to enable concurrent reads across threads. The Mutex serializes access, which is the cost of seek-based I/O compared to mmap's zero-copy reads. -
Zero-Copy Bytes: Uses
Bytes::from(Vec<u8>)which takes ownership of the heap buffer without copying. -
Bounds Checking:
read_range()validates offsets and truncates reads at EOF rather than returning errors for short reads. -
Read+Seek Traits: Implemented for compatibility with existing code that uses standard I/O patterns.
WARN Items
- No FUSE-mounted file test fixture: Would require setting up sshfs or similar, which is an environmental limitation not a code issue.
References
- Plan section: Phase 1.8 (FileSource description)
- Coordinator: pdftract-2cnmr (parent)
- Sibling implementations: PdfSource trait, MmapSource