All 5 child beads completed: - pdftract-3uq: Font subtype classifier and BaseFont prefix stripper - pdftract-juc: Standard 14 font registry with hardcoded metrics - pdftract-6ah: Embedded font program loader (ttf-parser/owned_ttf_parser) - pdftract-cv4: Type 0 composite font + descendant CIDFont loader - pdftract-5sh: CIDToGIDMap resolver (Identity and stream forms) 77 font module tests pass. Acceptance criteria: - PASS: All children closed - PASS: Classifier returns all 8 FontKind variants - PASS: Subset prefix stripping works correctly - PASS: CIDToGIDMap Identity and stream forms verified - PASS: No unwrap/expect on resource dict access Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6.3 KiB
6.3 KiB
pdftract-3qz: Phase 2.1 Font Type Detection (coordinator)
Summary
Coordinator for sub-phase 2.1: Font Type Detection. All 5 child beads completed successfully, delivering a comprehensive font module that can classify, load, and provide metrics for all PDF font types.
Children Completed
| Bead ID | Title | Commit | Verification Note |
|---|---|---|---|
| pdftract-3uq | Font subtype classifier and BaseFont prefix stripper | 46c515e |
notes/pdftract-3uq.md |
| pdftract-juc | Standard 14 font registry with hardcoded metrics | 7429a67 |
(included below) |
| pdftract-6ah | Embedded font program loader (ttf-parser/owned_ttf_parser) | ffaaf69 |
notes/pdftract-6ah.md |
| pdftract-cv4 | Type 0 composite font + descendant CIDFont loader | 5e2390f |
notes/pdftract-cv4.md |
| pdftract-5sh | CIDToGIDMap resolver (Identity and stream forms) | 03aa4da | notes/pdftract-5sh.md |
Acceptance Criteria Status
| Criterion | Status |
|---|---|
| All children closed | PASS - All 5 child beads closed |
| Classifier returns one of {Type1, Type1Std14, TrueType, Type0, CIDFontType0, CIDFontType2, Type3, OpenTypeCFF} | PASS |
Subset prefix ABCDEF+Times-Roman strips to Times-Roman for Std-14 lookup |
PASS |
CIDFontType2 with /CIDToGIDMap /Identity: GID == CID |
PASS |
| CIDFontType2 with stream CIDToGIDMap: 2-byte big-endian decode verified | PASS |
Module unit tests in crates/pdftract-core/src/font/ pass |
PASS - 77 tests |
| No unwrap/expect on resource dict access | PASS - uses .and_then() and defaults |
Module Structure
crates/pdftract-core/src/font/
├── mod.rs # FontKind enum, classify_font(), strip_subset_prefix()
├── std14.rs # Standard 14 font metrics registry (build.rs generated)
├── embedded.rs # EmbeddedFont, FontMetrics, OpenTypeMetrics, EmptyFontMetrics
└── type0.rs # Type0Font, DescendantCIDFont, CIDToGIDMap, /W array parsing
Test Results
test result: ok. 77 passed; 0 failed; 0 ignored
All font module tests pass, covering:
- Font classification (Type1, Type1Std14, TrueType, Type0, CIDFontType0, CIDFontType2, Type3, OpenTypeCFF)
- Subset prefix stripping (valid, invalid, edge cases)
- Standard 14 font detection
- Type0 composite font loading
- CIDToGIDMap resolution (Identity and stream forms)
- /W array parsing (per-CID and range forms)
- Embedded font program loading (TrueType, OpenType CFF)
Child Bead Summaries
pdftract-3uq: Font subtype classifier and BaseFont prefix stripper
- Implemented
FontKindenum with all 8 PDF font types strip_subset_prefix()- validates exactly 6 ASCII uppercase ++classify_font()- reads/Subtype,/BaseFont, descendant CIDFont, FontDescriptor- 21 unit tests covering all branches
pdftract-juc: Standard 14 font registry with hardcoded metrics
build.rsgenerates compile-time metrics from AFM-derived JSONStd14Metricsstruct with widths, ascent, descent, italic_angle, font_bboxget_std14_metrics()lookup by canonical name (post-prefix-strip)- Symbol/ZapfDingbats use distinct encodings (SymbolEncoding, ZapfDingbatsEncoding)
- Binary footprint: ~20 KB generated source, ~8 KB data (well under 60 KB limit)
pdftract-6ah: Embedded font program loader
EmbeddedFontwrappingowned_ttf_parser::OwnedFaceFontMetricstrait withglyph_id_for(),advance(),bbox()EmptyFontMetricsfallback for corrupt/missing font programs- Graceful handling of subset fonts (unmapped chars return None)
- Diagnostic
FONT_PARSE_FAILEDfor corrupt programs
pdftract-cv4: Type 0 composite font + descendant CIDFont loader
Type0Fontwith descendantDescendantCIDFont/DWdefault width parsing (default 1000)/Warray parsing (per-CID[c [w1 w2 ...]]and range[cfirst clast w])- Sparse
BTreeMap<u32, u16>storage for CID widths - CIDFontType0 (CFF) vs CIDFontType2 (TrueType) detection
pdftract-5sh: CIDToGIDMap resolver
CidToGidMap::{Identity, Array(Box<[u16]>)}enum- Identity short-circuit (zero allocation, GID == CID)
- Stream form: 2-byte big-endian u16 array indexed by CID
- Diagnostic
CIDTOGIDMAP_TRUNCATEDfor odd-byte-count input - Out-of-range CID returns GID 0 (notdef glyph)
Integration Points
This module delivers the Font value needed by:
- Phase 2.2: Encoding resolution (ToUnicode, differences, AGL fallback)
- Phase 2.3: CJK CMap parsing and CID-to-Unicode mapping
- Phase 2.4: Type3 font content stream execution
- Phase 3: Content stream execution (Tj, TJ, BT/ET operators)
Files Modified/Created
Created:
crates/pdftract-core/src/font/mod.rscrates/pdftract-core/src/font/std14.rscrates/pdftract-core/src/font/embedded.rscrates/pdftract-core/src/font/type0.rscrates/pdftract-core/build.rscrates/pdftract-core/build/std14-metrics.jsoncrates/pdftract-core/build/generate_std14_metrics.pycrates/pdftract-core/build/fix_std14_weights.py
Modified:
crates/pdftract-core/src/lib.rs- addedpub mod font;crates/pdftract-core/src/diagnostics.rs- addedFONT_PARSE_FAILED,CIDTOGIDMAP_TRUNCATED.gitignore- added!/crates/pdftract-core/build/exceptions
Commits Referenced
46c515efeat(pdftract-3uq): add font type classifier and subset prefix stripper7429a67feat(pdftract-juc): implement Standard 14 font metrics registryffaaf69feat(pdftract-6ah): implement embedded font program loader5e2390ffeat(pdftract-cv4): Type 0 composite font + descendant CIDFont loader03aa4dafeat(pdftract-5sh): CIDToGIDMap resolver for CIDFontType2075de55docs(pdftract-cv4): add verification noteb7392f1docs(pdftract-6ah): add verification note
Notes
- All child beads have verification notes in
notes/directory - Type3 font
/CharProcsexecution deferred to Phase 2.4 (as planned) - OpenType CFF uses same
owned_ttf_parserentrypoint as TrueType (CFF support viaopentype-layoutfeature) - The classifier handles indirect references gracefully (returns default, does not crash)
- Standard 14 fonts may have embedded font programs; registry serves as fallback
Ready for Next Phase
Phase 2.1 is complete. The font module is ready for:
- Phase 2.2: Encoding resolution (ToUnicode, differences, AGL)
- Phase 2.3: CJK CMap parsing
- Phase 2.4: Type3 content stream execution