pdftract/crates/pdftract-core/build/predefined-cmaps
jedarden f804887a86 feat(pdftract-43ry): implement predefined CMap registry
Implement a registry of the 9 named CMaps PDF readers MUST support
without an embedded CMap stream: Identity-H, Identity-V, and 8 UTF16
CMaps (UniJIS-UTF16-H/V, UniGB-UTF16-H/V, UniCNS-UTF16-H/V,
UniKS-UTF16-H/V).

- Added PredefinedCMap struct with name, is_vertical, collection fields
- from_name() resolves all 10 predefined CMap names
- decode_bytes() reads 2-byte big-endian codes as CIDs
- cid_to_unicode() maps CIDs to Unicode codepoints (None for Identity-H/V)
- Build-time generation of PHF maps from JSON files
- Feature flag 'cjk' controls ~1.2 MB UCS2 map inclusion (default off)

Acceptance criteria:
- All 10 names resolve via from_name()
- Identity-H decodes [0x00, 0x41] to CID 65
- UniJIS-UTF16-H decodes CID 236 to U+3042 (あ)
- Vertical (V) variant returns identical CID->Unicode as Horizontal (H)
- Unknown name returns None
- Feature flag 'cjk' controls UCS2 map inclusion

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 23:00:59 -04:00
..
adobe-cns1.json feat(pdftract-43ry): implement predefined CMap registry 2026-05-23 23:00:59 -04:00
adobe-gb1.json feat(pdftract-43ry): implement predefined CMap registry 2026-05-23 23:00:59 -04:00
adobe-japan1.json feat(pdftract-43ry): implement predefined CMap registry 2026-05-23 23:00:59 -04:00
adobe-korea1.json feat(pdftract-43ry): implement predefined CMap registry 2026-05-23 23:00:59 -04:00