pdftract

History

jedarden 1195216fe8 feat(pdftract-43sg2): implement single-pass per-file parse pipeline for grep Implement the worker_run() function that processes a single FileWorkItem into MatchEvents via Phase 1 (lexer/object/xref) + Phase 3 (content streams) + Phase 4 span builder (skipping Phase 4.5 reading-order detection). Key changes: - Add ProgressEvent enum with FileStart, FileProgress, FileDone, FileSkipped variants - Create worker.rs with worker_run() function for single-pass PDF parsing - Implement extract_spans_from_page() using process_with_mode() for Phase 3 - Implement group_glyphs_into_spans() for span building without reading order - Add compute_fingerprint_for_grep() for document fingerprinting - Handle encrypted PDFs with diagnostic emission - Support --invert-match with synthetic event emission for zero-match spans - Fix encryption module compilation issues (rc4/aes_256 imports, RC4 implementation) - Add crossbeam-channel dependency for event channels The worker skips reading-order detection (Phase 4.5) since grep doesn't need it, cutting per-file CPU by ~30-40% on typical pages. Closes: pdftract-43sg2		2026-05-26 20:15:39 -04:00
..
benches	feat(pdftract-3h9xo): implement threads JSON output + schema integration	2026-05-25 13:40:15 -04:00
src	feat(pdftract-43sg2): implement single-pass per-file parse pipeline for grep	2026-05-26 20:15:39 -04:00
tests	feat(pdftract-3h9xo): implement threads JSON output + schema integration	2026-05-25 13:40:15 -04:00
build.rs	docs(pdftract-32y9): finalize SDK architecture note with workspace layout, cross-compile matrix, and KU-12 alignment	2026-05-24 06:38:23 -04:00
Cargo.toml	feat(pdftract-43sg2): implement single-pass per-file parse pipeline for grep	2026-05-26 20:15:39 -04:00
pdftract-cli.cdx.json	feat(pdftract-67tm8): implement MCP stdio transport with integration tests	2026-05-23 00:16:42 -04:00