pdftract/crates/pdftract-core/build
jedarden f804887a86 feat(pdftract-43ry): implement predefined CMap registry
Implement a registry of the 9 named CMaps PDF readers MUST support
without an embedded CMap stream: Identity-H, Identity-V, and 8 UTF16
CMaps (UniJIS-UTF16-H/V, UniGB-UTF16-H/V, UniCNS-UTF16-H/V,
UniKS-UTF16-H/V).

- Added PredefinedCMap struct with name, is_vertical, collection fields
- from_name() resolves all 10 predefined CMap names
- decode_bytes() reads 2-byte big-endian codes as CIDs
- cid_to_unicode() maps CIDs to Unicode codepoints (None for Identity-H/V)
- Build-time generation of PHF maps from JSON files
- Feature flag 'cjk' controls ~1.2 MB UCS2 map inclusion (default off)

Acceptance criteria:
- All 10 names resolve via from_name()
- Identity-H decodes [0x00, 0x41] to CID 65
- UniJIS-UTF16-H decodes CID 236 to U+3042 (あ)
- Vertical (V) variant returns identical CID->Unicode as Horizontal (H)
- Unknown name returns None
- Feature flag 'cjk' controls UCS2 map inclusion

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 23:00:59 -04:00
..
predefined-cmaps feat(pdftract-43ry): implement predefined CMap registry 2026-05-23 23:00:59 -04:00
agl.json feat(pdftract-28m6): implement AGL compile-time phf::Map 2026-05-23 18:44:47 -04:00
aglfn.txt feat(pdftract-28m6): implement AGL compile-time phf::Map 2026-05-23 18:44:47 -04:00
fix_std14_weights.py feat(pdftract-juc): implement Standard 14 font metrics registry 2026-05-23 14:04:02 -04:00
font-fingerprints.json feat(pdftract-njde): implement font fingerprint cache (Level 3) 2026-05-23 21:27:24 -04:00
generate_agl.py feat(pdftract-28m6): implement AGL compile-time phf::Map 2026-05-23 18:44:47 -04:00
generate_std14_metrics.py feat(pdftract-juc): implement Standard 14 font metrics registry 2026-05-23 14:04:02 -04:00
glyphlist.txt feat(pdftract-28m6): implement AGL compile-time phf::Map 2026-05-23 18:44:47 -04:00
named-encodings.json feat(pdftract-3dwu): implement named encoding tables 2026-05-23 18:00:05 -04:00
std14-metrics.json feat(pdftract-juc): implement Standard 14 font metrics registry 2026-05-23 14:04:02 -04:00