Implements CIDToGIDMap resolver with Identity and stream forms: - Identity: zero-allocation short-circuit (GID == CID) - Stream: parses 2-byte big-endian GID values into Box<[u16]> - Emits CIDTOGIDMAP_TRUNCATED diagnostic on odd-byte-count input - Out-of-range CID returns GID 0 (notdef glyph) without panic Acceptance criteria: - Identity form: lookup of any CID returns same value as u16 - Stream form: synthetic 3-CID array decodes correctly [0, 5, 10] - Out-of-range CID returns GID 0 with no panic - Diagnostic CIDTOGIDMAP_TRUNCATED emitted on odd-byte-count input Refs: pdftract-5sh, Phase 2.1 line 1315
2.5 KiB
2.5 KiB
pdftract-5sh: CIDToGIDMap resolver (Identity and stream forms)
Summary
Implemented the CIDToGIDMap resolver for CIDFontType2 descendant fonts with:
/Identityname detection (zero-allocation short-circuit)- Stream form parsing into
Box<[u16]>array (2-byte big-endian GID values) CIDTOGIDMAP_TRUNCATEDdiagnostic for odd-byte-count input- Out-of-range CID returns GID 0 (notdef glyph)
Changes Made
1. Added new diagnostic code (diagnostics.rs)
DiagCode::FontCidtogidmapTruncated- emitted when CIDToGIDMap stream has odd byte count- Added to category, name, severity (Warning), and catalog entries
2. Updated CIDToGIDMap enum (type0.rs)
Changed from Custom(Vec<u8>) to Array(Box<[u16]>):
- Pre-parsed u16 array instead of raw bytes
- Single heap allocation, not per-lookup
get()method now usesarr.get(cid as usize).copied().or(Some(0))
3. Updated load_cid_to_gid_map() function
- Now parses decoded bytes into
Box<[u16]>array - Emits
CIDTOGIDMAP_TRUNCATEDdiagnostic on odd-length input - Truncates trailing byte instead of failing
- Takes
diagnostics: &mut Vec<Diagnostic>parameter
4. Updated tests
test_cid_to_gid_map_array- tests Array variant with [0, 1, 2, 3]test_cid_to_gid_map_array_big_endian- tests big-endian parsingtest_cid_to_gid_map_out_of_range- tests GID 0 return for out-of-range CIDtest_cid_to_gid_map_from_stream- tests stream loading with [0, 5, 10] per acceptance criteriatest_cid_to_gid_map_truncated- tests odd-byte-count diagnostic emission
Acceptance Criteria - PASS
- [PASS] Identity form: lookup of any CID returns same value as u16
- [PASS] Stream form: synthetic 3-CID array decodes correctly [0, 5, 10]
- [PASS] Out-of-range CID returns GID 0 with no panic
- [PASS] Diagnostic
CIDTOGIDMAP_TRUNCATEDemitted on odd-byte-count input
Test Results
test font::type0::tests::test_cid_to_gid_map_array ... ok
test font::type0::tests::test_cid_to_gid_map_array_big_endian ... ok
test font::type0::tests::test_cid_to_gid_map_identity ... ok
test font::type0::tests::test_cid_to_gid_map_out_of_range ... ok
test font::type0::tests::test_cid_to_gid_map_truncated ... ok
test font::type0::tests::test_cid_to_gid_map_from_stream ... ok
test result: ok. 6 passed; 0 failed; 0 ignored
All 25 type0 tests pass.
Files Modified
crates/pdftract-core/src/diagnostics.rs- added FontCidtogidmapTruncated diagnosticcrates/pdftract-core/src/font/type0.rs- updated CIDToGIDMap enum and implementation