pdftract/docs
jedarden cf8f04e3ec docs(pdftract-26r8): finalize glyph recognition research note v1.0
- Reorganize around the four-level Unicode recovery cascade from plan
- Document all cascade levels with confidence scores:
  - Level 1: ToUnicode CMap (1.0)
  - Level 2: Encoding + AGL (0.9)
  - Level 3: Font fingerprint cache (0.85)
  - Level 4: Glyph shape recognition (0.7)
- Add shape database design (pHash algorithm, query, format)
- Document pHash collision tie-break rules (frequency-based)
- Add Type 3 font handling section
- Cross-reference Phase 2.2, 2.4, 2.5 and OQ-02

File grows from 112 to 210 lines. Covers all acceptance criteria.

Closes: pdftract-26r8
2026-05-24 02:10:06 -04:00
..
adr feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance feat(pdftract-5omc): implement SDK conformance test runner pattern 2026-05-18 01:22:23 -04:00
notes feat(pdftract-3zhf): add unified TableDetector::detect entry point 2026-05-24 00:51:59 -04:00
plan feat(pdftract-3zhf): add unified TableDetector::detect entry point 2026-05-24 00:51:59 -04:00
research docs(pdftract-26r8): finalize glyph recognition research note v1.0 2026-05-24 02:10:06 -04:00
schema/v1.0 feat(pdftract-zy2jx): generate JSON Schema from Rust output types 2026-05-24 01:29:14 -04:00
security docs(pdftract-58kz): add security policy documentation 2026-05-20 19:39:24 -04:00
user-docs docs(pdftract-1g87): create mdBook scaffolding for user documentation 2026-05-18 00:38:51 -04:00
research-index.md Add parallel extraction research and comprehensive research index 2026-05-16 16:30:35 -04:00