pdftract/crates/pdftract-core/__test__.pdf
jedarden 8c9a940159 feat(pdftract-15pz8): implement multi-process safe cache operations
Implements Phase 6.9.5: atomic file writes and concurrent access safety
for multiple pdftract processes sharing the same cache directory.

## Changes

- Add `multi_process.rs` module with atomic write/read primitives
- Atomic write protocol: temp file + fsync + rename
- Reader protocol with corruption handling (deletes corrupt entries)
- Startup cleanup of stale temp files (> 1 hour old)
- fsync control via PDFTRACT_CACHE_NO_FSYNC env var
- No distributed locks - tolerates duplicated work on first-miss races

## Module structure

- `Writer`: Atomic cache entry writes via temp + rename
- `Reader`: Safe reads with decompression and corruption detection
- `cleanup_stale_temp_files()`: Startup cleanup for crash-recovered temp files

## Acceptance criteria met

- [x] Concurrent extractors on same fingerprint: both succeed; no deadlock
- [x] Reader sees fully-decompressable entry always (never torn write)
- [x] 8 concurrent writers writing 8 different keys: all materialize correctly
- [x] Corrupt entry on disk: treated as miss; entry deleted
- [x] Stale temp file > 1 hour old: cleaned up at startup
- [x] Stress test: 4 processes × 100 iterations → no errors

## Tests

- 18 tests in `multi_process.rs`
- 92 total cache module tests pass

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:31:11 -04:00

374 B