pdftract

History

jedarden d161d109b3 docs(plan): revise plan to center accuracy/speed/weight as hard targets - Add Primary Objectives section with CI-gated measurable targets: accuracy (CER <0.5%, WER <3%, readability >0.85), speed (100pp <3s, 10x vs pdfminer), weight (<4MB default binary, <20 default deps) - Add feature-flag strategy: axum/tokio/pdfium/pyo3 are all optional; default build is core extraction + CLI only - Add Phase 4.7: text readability validation and correction pipeline (ligature repair, hyphenation, mojibake detection, readability scoring) - Make pdfium-render explicitly optional (full-render feature) vs. the always-present direct image compositing path - Add Tier 4 competitive benchmark suite (vs. pdfminer.six, pypdf, pdfplumber) - Remove jpeg-decoder and whichlang from dependency matrix (unnecessary) - Rename implementation-plan.md → plan.md (matches CLAUDE.md reference) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-05-16 17:07:48 -04:00
..
notes	Add SDK architecture notes covering top 10 languages	2026-05-16 14:51:25 -04:00
plan	docs(plan): revise plan to center accuracy/speed/weight as hard targets	2026-05-16 17:07:48 -04:00
research	Add parallel extraction research and comprehensive research index	2026-05-16 16:30:35 -04:00
research-index.md	Add parallel extraction research and comprehensive research index	2026-05-16 16:30:35 -04:00