pdftract/docs/plan
jedarden 2d194a4b1b docs(plan): fix 15 gaps from Round 3 gap review
HIGH:
- Add fontdue crate for glyph rasterization (ttf-parser is a parser, not rasterizer)
- Remove num_cpus reference (rayon default pool sizing is sufficient)
- Update dep count target to < 30 direct crates (< 20 was violated by plan's own list)
- Fix watermark deferral: Phase 7 not Phase 6; no kind:'watermark' until Phase 7
- Add Phase 7.6 (Hyperlink/Annotation Extraction) and 7.7 (Article Thread Chains)

MEDIUM:
- Document header/footer streaming mode limitation: first 3 pages emit as paragraph
- Add conformance/XFA detection spec to Phase 1.4; move quick-xml to default feature
- Clarify pdftract-py-ci is Phase 0 stub, filled in during Phase 6.3
- Specify /Contents array concatenation in Phase 1.4 page tree
- Add page rotation un-rotation step after Phase 3 glyph bbox computation
- Add password delivery: ExtractionOptions.password, --password CLI, HTTP form, Python kwarg
- Fix glyph shape DB: phf::Map → sorted &'static [(u64,char)] slice for Hamming nearest-neighbor
- Add Python benchmark runner infrastructure (python:3.11-slim, requirements.txt, hyperfine)
- Add wordlist-bloom to Feature flags bullet list

LOW:
- Clarify extract_stream() yields page dicts only, not header/footer frames

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:18:33 -04:00
..
.gitkeep Initial repo scaffold with README and docs structure 2026-05-16 14:26:16 -04:00
plan.md docs(plan): fix 15 gaps from Round 3 gap review 2026-05-16 18:18:33 -04:00