pdftract

History

jedarden 1195216fe8 feat(pdftract-43sg2): implement single-pass per-file parse pipeline for grep Implement the worker_run() function that processes a single FileWorkItem into MatchEvents via Phase 1 (lexer/object/xref) + Phase 3 (content streams) + Phase 4 span builder (skipping Phase 4.5 reading-order detection). Key changes: - Add ProgressEvent enum with FileStart, FileProgress, FileDone, FileSkipped variants - Create worker.rs with worker_run() function for single-pass PDF parsing - Implement extract_spans_from_page() using process_with_mode() for Phase 3 - Implement group_glyphs_into_spans() for span building without reading order - Add compute_fingerprint_for_grep() for document fingerprinting - Handle encrypted PDFs with diagnostic emission - Support --invert-match with synthetic event emission for zero-match spans - Fix encryption module compilation issues (rc4/aes_256 imports, RC4 implementation) - Add crossbeam-channel dependency for event channels The worker skips reading-order detection (Phase 4.5) since grep doesn't need it, cutting per-file CPU by ~30-40% on typical pages. Closes: pdftract-43sg2		2026-05-26 20:15:39 -04:00
..
doctor	feat(pdftract-3s2i): implement Phase 5.5.2 validation filter	2026-05-24 04:57:17 -04:00
grep	feat(pdftract-43sg2): implement single-pass per-file parse pipeline for grep	2026-05-26 20:15:39 -04:00
inspect	feat(pdftract-3h9xo): implement threads JSON output + schema integration	2026-05-25 13:40:15 -04:00
mcp	feat(pdftract-4li3d): implement security constraints for serve mode	2026-05-26 18:47:51 -04:00
middleware	feat(pdftract-3h9xo): implement threads JSON output + schema integration	2026-05-25 13:40:15 -04:00
cache_cmd.rs	feat(pdftract-3s2i): implement Phase 5.5.2 validation filter	2026-05-24 04:57:17 -04:00
classify.rs	fix: resolve compilation errors across codebase	2026-05-25 08:38:04 -04:00
codegen.rs	feat(pdftract-3s2i): implement Phase 5.5.2 validation filter	2026-05-24 04:57:17 -04:00
lib.rs	feat(pdftract-5boxq): implement audit-log FILE flag with NDJSON writer + middleware	2026-05-25 05:14:06 -04:00
main.rs	feat(pdftract-4li3d): implement security constraints for serve mode	2026-05-26 18:47:51 -04:00
password.rs	feat(pdftract-3s2i): implement Phase 5.5.2 validation filter	2026-05-24 04:57:17 -04:00
serve.rs	feat(pdftract-4li3d): implement security constraints for serve mode	2026-05-26 18:47:51 -04:00
verify_receipt.rs	feat(pdftract-3s2i): implement Phase 5.5.2 validation filter	2026-05-24 04:57:17 -04:00