pdftract/notes/pdftract-1psmn.md
jedarden 823712d65c fix(pdftract-1psmn): fix mmap test compilation errors
- Add std::sync::Arc import for thread sharing
- Fix lifetime issue in test_sync_multiple_threads using Arc
- Add mut to source in test_empty_file for Read trait

All FileSource tests pass (12/12).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 02:19:44 -04:00

4 KiB

pdftract-1psmn: FileSource Implementation

Summary

Implemented FileSource as a PdfSource fallback for when memory-mapping is not available or desired. This provides standard I/O-based access to PDF files using Read+Seek.

Changes Made

1. crates/pdftract-core/src/source/file_source.rs

  • Rewrote FileSource to use parking_lot::Mutex<File> for thread-safe concurrent access
  • Implemented proper Send + Sync traits via unsafe impl (backed by Mutex)
  • Implemented PdfSource trait with len() and read_range() methods
  • Implemented Read and Seek traits for standard I/O usage
  • Added comprehensive tests:
    • test_open_valid_file: Opens a valid file
    • test_open_nonexistent_file: Returns Err for non-existent file
    • test_read_range: Reads byte ranges correctly
    • test_read_seek: Tests Read+Seek trait methods
    • test_read_range_bounds: Tests boundary conditions
    • test_send_sync: Verifies Send trait (move to thread)
    • test_sync_multiple_threads: Verifies Sync trait (concurrent reads from 4 threads)
    • test_concurrent_read_range: Verifies concurrent reads all succeed
    • test_read_range_past_eof_returns_err: Tests EOF handling
    • test_empty_file: Handles empty files
    • test_large_file: Handles 100KB file
    • test_read_mixed_with_seek: Tests mixed read/seek operations

2. crates/pdftract-core/src/source/mmap.rs (test fixes)

  • Added std::sync::Arc import for thread sharing
  • Fixed lifetime issue in test_sync_multiple_threads by using Arc instead of &source
  • Added mut to source in test_empty_file for Read trait compatibility

3. crates/pdftract-core/Cargo.toml

  • Added parking_lot = "0.12" dependency

Acceptance Criteria

Criterion Status Notes
FileSource::open(/path/to/file.pdf) returns Ok PASS test_open_valid_file
FileSource::open(/nonexistent) returns Err PASS test_open_nonexistent_file
read_range(0, 10) returns first 10 bytes PASS test_read_range
read_range past EOF returns Err PASS test_read_range_bounds
Send + Sync: FileSource can be sent across threads PASS test_send_sync, test_sync_multiple_threads
Concurrent read_range from 4 threads succeeds PASS test_concurrent_read_range
Test fixture for FUSE-mounted file WARN No FUSE fixture tested (environment limitation)

Test Results

All 12 FileSource tests pass:

source::file_source::tests::test_open_valid_file          PASS
source::file_source::tests::test_open_nonexistent_file    PASS
source::file_source::tests::test_read_seek                 PASS
source::file_source::tests::test_send_sync                 PASS
source::file_source::tests::test_read_range_past_eof_ret.. PASS
source::file_source::tests::test_concurrent_read_range     PASS
source::file_source::tests::test_read_range_bounds         PASS
source::file_source::tests::test_empty_file               PASS
source::file_source::tests::test_sync_multiple_threads    PASS
source::file_source::tests::test_read_range                PASS
source::file_source::tests::test_read_mixed_with_seek     PASS
source::file_source::tests::test_large_file                PASS

Key Implementation Details

  1. Thread Safety: Uses parking_lot::Mutex<File> to enable concurrent reads across threads. The Mutex serializes access, which is the cost of seek-based I/O compared to mmap's zero-copy reads.

  2. Zero-Copy Bytes: Uses Bytes::from(Vec<u8>) which takes ownership of the heap buffer without copying.

  3. Bounds Checking: read_range() validates offsets and truncates reads at EOF rather than returning errors for short reads.

  4. Read+Seek Traits: Implemented for compatibility with existing code that uses standard I/O patterns.

WARN Items

  • No FUSE-mounted file test fixture: Would require setting up sshfs or similar, which is an environmental limitation not a code issue.

References

  • Plan section: Phase 1.8 (FileSource description)
  • Coordinator: pdftract-2cnmr (parent)
  • Sibling implementations: PdfSource trait, MmapSource