pdftract

History

jedarden 1dfaf73aa4 Some checks are pending Schema Generation Validation / Validate JSON Schema (push) Waiting to run Details Schema Generation Validation / Validate JSON Syntax (push) Waiting to run Details feat(pdftract-3g6ne): implement CMap codespace range parser This commit adds the codespace range parser for CMap streams. The parser extracts the begincodespacerange / endcodespacerange blocks that define legal byte-width boundaries for character codes in a CMap. ## Implementation - CodespaceRange: Single range with lo/hi bounds (stored as [u8; 4]) and width (1-4 bytes) - CodespaceRanges: Collection with SmallVec<[CodespaceRange; 8]> - CodespaceParser: PostScript-style tokenizer for begincodespacerange blocks ## Acceptance Criteria (all PASS) - Parse <00> <7F> → 1 range, width=1 ✅ - Parse <00> <7F> <8000> <FFFF> in one block → 2 ranges ✅ - Width inference: 2-char hex → width=1; 4-char hex → width=2 ✅ - Case-insensitive hex (<C0> and <c0> equivalent) ✅ - Malformed range (width mismatch) → diagnostic + skipped ✅ - Empty CMap → empty ranges ✅ - JIS range <8140> <FEFE> → 2-byte CJK ✅ - 3-byte and 4-byte range support ✅ Also adds encrypted fixture provenance entries to PROVENANCE.md. Co-Authored-By: Claude Code <noreply@anthropic.com>		2026-05-28 05:47:07 -04:00
..
pdftract-cer-diff	docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files	2026-05-23 10:36:28 -04:00
pdftract-cli	chore(pdftract-36glh): remove unused JpxDecoder import and add verification note	2026-05-28 05:23:13 -04:00
pdftract-core	feat(pdftract-3g6ne): implement CMap codespace range parser	2026-05-28 05:47:07 -04:00
pdftract-libpdftract	feat(pdftract-3s2i): implement Phase 5.5.2 validation filter	2026-05-24 04:57:17 -04:00
pdftract-py	fix(pdftract-63ka2): AES-128 test buffer allocation for PKCS#7 padding	2026-05-28 01:30:33 -04:00