pdftract/docs
jedarden 116db89c95 Add three research documents on routing and text reconstruction
- word-boundary-reconstruction: expected position formula with Tc/Tw/Tz,
  TJ kerning gap detection, Td/Tm jump analysis, four space-width threshold
  strategies including adaptive histogram, multi-column gap discrimination
- scanned-vs-vector-page-classification: four-category taxonomy, fast
  pre-checks, image coverage AABB computation, character density ratio,
  validity rate, glyph bbox plausibility, region routing map, confidence
  scoring with cost-aware OCR threshold
- pdfa-compliance-and-extraction: ISO 19005 part/level matrix, XMP
  pdfaid detection, Level B/U/A guarantee implications for extraction,
  font embedding requirements, artifact tagging, PDF/A-3 embedded files,
  PdfaLevel enum with per-level fast-path branching

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:22:08 -04:00
..
notes Add SDK architecture notes covering top 10 languages 2026-05-16 14:51:25 -04:00
plan Initial repo scaffold with README and docs structure 2026-05-16 14:26:16 -04:00
research Add three research documents on routing and text reconstruction 2026-05-16 15:22:08 -04:00