pdftract/crates/pdftract-core/src/encryption
jedarden 1195216fe8 feat(pdftract-43sg2): implement single-pass per-file parse pipeline for grep
Implement the worker_run() function that processes a single FileWorkItem
into MatchEvents via Phase 1 (lexer/object/xref) + Phase 3 (content streams)
+ Phase 4 span builder (skipping Phase 4.5 reading-order detection).

Key changes:
- Add ProgressEvent enum with FileStart, FileProgress, FileDone, FileSkipped variants
- Create worker.rs with worker_run() function for single-pass PDF parsing
- Implement extract_spans_from_page() using process_with_mode() for Phase 3
- Implement group_glyphs_into_spans() for span building without reading order
- Add compute_fingerprint_for_grep() for document fingerprinting
- Handle encrypted PDFs with diagnostic emission
- Support --invert-match with synthetic event emission for zero-match spans
- Fix encryption module compilation issues (rc4/aes_256 imports, RC4 implementation)
- Add crossbeam-channel dependency for event channels

The worker skips reading-order detection (Phase 4.5) since grep doesn't need it,
cutting per-file CPU by ~30-40% on typical pages.

Closes: pdftract-43sg2
2026-05-26 20:15:39 -04:00
..
aes_256.rs feat(pdftract-4li3d): implement security constraints for serve mode 2026-05-26 18:47:51 -04:00
mod.rs feat(pdftract-43sg2): implement single-pass per-file parse pipeline for grep 2026-05-26 20:15:39 -04:00
rc4.rs feat(pdftract-43sg2): implement single-pass per-file parse pipeline for grep 2026-05-26 20:15:39 -04:00