pdftract/docs
jedarden d161d109b3 docs(plan): revise plan to center accuracy/speed/weight as hard targets
- Add Primary Objectives section with CI-gated measurable targets:
  accuracy (CER <0.5%, WER <3%, readability >0.85), speed (100pp <3s,
  10x vs pdfminer), weight (<4MB default binary, <20 default deps)
- Add feature-flag strategy: axum/tokio/pdfium/pyo3 are all optional;
  default build is core extraction + CLI only
- Add Phase 4.7: text readability validation and correction pipeline
  (ligature repair, hyphenation, mojibake detection, readability scoring)
- Make pdfium-render explicitly optional (full-render feature) vs. the
  always-present direct image compositing path
- Add Tier 4 competitive benchmark suite (vs. pdfminer.six, pypdf, pdfplumber)
- Remove jpeg-decoder and whichlang from dependency matrix (unnecessary)
- Rename implementation-plan.md → plan.md (matches CLAUDE.md reference)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 17:07:48 -04:00
..
notes Add SDK architecture notes covering top 10 languages 2026-05-16 14:51:25 -04:00
plan docs(plan): revise plan to center accuracy/speed/weight as hard targets 2026-05-16 17:07:48 -04:00
research Add parallel extraction research and comprehensive research index 2026-05-16 16:30:35 -04:00
research-index.md Add parallel extraction research and comprehensive research index 2026-05-16 16:30:35 -04:00