pdftract/notes/pdftract-5sh.md
jedarden 77304153fc feat(pdftract-5sh): CIDToGIDMap resolver for CIDFontType2
Implements CIDToGIDMap resolver with Identity and stream forms:
- Identity: zero-allocation short-circuit (GID == CID)
- Stream: parses 2-byte big-endian GID values into Box<[u16]>
- Emits CIDTOGIDMAP_TRUNCATED diagnostic on odd-byte-count input
- Out-of-range CID returns GID 0 (notdef glyph) without panic

Acceptance criteria:
- Identity form: lookup of any CID returns same value as u16
- Stream form: synthetic 3-CID array decodes correctly [0, 5, 10]
- Out-of-range CID returns GID 0 with no panic
- Diagnostic CIDTOGIDMAP_TRUNCATED emitted on odd-byte-count input

Refs: pdftract-5sh, Phase 2.1 line 1315
2026-05-23 15:23:27 -04:00

2.5 KiB

pdftract-5sh: CIDToGIDMap resolver (Identity and stream forms)

Summary

Implemented the CIDToGIDMap resolver for CIDFontType2 descendant fonts with:

  • /Identity name detection (zero-allocation short-circuit)
  • Stream form parsing into Box<[u16]> array (2-byte big-endian GID values)
  • CIDTOGIDMAP_TRUNCATED diagnostic for odd-byte-count input
  • Out-of-range CID returns GID 0 (notdef glyph)

Changes Made

1. Added new diagnostic code (diagnostics.rs)

  • DiagCode::FontCidtogidmapTruncated - emitted when CIDToGIDMap stream has odd byte count
  • Added to category, name, severity (Warning), and catalog entries

2. Updated CIDToGIDMap enum (type0.rs)

Changed from Custom(Vec<u8>) to Array(Box<[u16]>):

  • Pre-parsed u16 array instead of raw bytes
  • Single heap allocation, not per-lookup
  • get() method now uses arr.get(cid as usize).copied().or(Some(0))

3. Updated load_cid_to_gid_map() function

  • Now parses decoded bytes into Box<[u16]> array
  • Emits CIDTOGIDMAP_TRUNCATED diagnostic on odd-length input
  • Truncates trailing byte instead of failing
  • Takes diagnostics: &mut Vec<Diagnostic> parameter

4. Updated tests

  • test_cid_to_gid_map_array - tests Array variant with [0, 1, 2, 3]
  • test_cid_to_gid_map_array_big_endian - tests big-endian parsing
  • test_cid_to_gid_map_out_of_range - tests GID 0 return for out-of-range CID
  • test_cid_to_gid_map_from_stream - tests stream loading with [0, 5, 10] per acceptance criteria
  • test_cid_to_gid_map_truncated - tests odd-byte-count diagnostic emission

Acceptance Criteria - PASS

  • [PASS] Identity form: lookup of any CID returns same value as u16
  • [PASS] Stream form: synthetic 3-CID array decodes correctly [0, 5, 10]
  • [PASS] Out-of-range CID returns GID 0 with no panic
  • [PASS] Diagnostic CIDTOGIDMAP_TRUNCATED emitted on odd-byte-count input

Test Results

test font::type0::tests::test_cid_to_gid_map_array ... ok
test font::type0::tests::test_cid_to_gid_map_array_big_endian ... ok
test font::type0::tests::test_cid_to_gid_map_identity ... ok
test font::type0::tests::test_cid_to_gid_map_out_of_range ... ok
test font::type0::tests::test_cid_to_gid_map_truncated ... ok
test font::type0::tests::test_cid_to_gid_map_from_stream ... ok
test result: ok. 6 passed; 0 failed; 0 ignored

All 25 type0 tests pass.

Files Modified

  • crates/pdftract-core/src/diagnostics.rs - added FontCidtogidmapTruncated diagnostic
  • crates/pdftract-core/src/font/type0.rs - updated CIDToGIDMap enum and implementation