pdftract/crates
jedarden 21d6514ca8 feat(pdftract-qzjw): implement 4-level encoding resolver with per-font cache
Implements Phase 2.2 encoding fallback chain:
- L1: ToUnicode CMap (1.0 confidence)
- L2: Named encoding + AGL (0.9 confidence)
- L3: Font fingerprint cache (0.85 confidence)
- L4: Shape recognition stub (0.7 confidence, cfg-gated)

Features:
- DashMap-based per-font resolution cache
- Single GLYPH_UNMAPPED diagnostic per (font, code) miss
- FontId from Arc pointer for unique identification
- ResolvedGlyph with chars, source, and confidence
- Proper short-circuit on L1 empty/U+FFFD results

Acceptance criteria:
-  Ligature expansion → multi-char slice, confidence 1.0
-  AGL lookup → confidence 0.9
-  Fingerprint lookup → confidence 0.85
-  All-level miss → U+FFFD, confidence 0.0, single diagnostic
-  Cache hit returns identical result to miss

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 22:09:26 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-27n3): implement border padding, pipeline orchestration, and fixtures 2026-05-23 21:55:11 -04:00
pdftract-core feat(pdftract-qzjw): implement 4-level encoding resolver with per-font cache 2026-05-23 22:09:26 -04:00
pdftract-libpdftract feat(pdftract-juc): implement Standard 14 font metrics registry 2026-05-23 14:04:02 -04:00
pdftract-py docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00