pdftract

History

jedarden b535638104 feat(pdftract-2bsfc): implement document catalog parser with PageLabels number tree Implement the document catalog parser (/Root traversal) for PDF documents. The catalog parser extracts all key entries from the document catalog including Pages, Outlines, MarkInfo, StructTreeRoot, AcroForm, Names, Metadata, PageLabels, OCProperties, OpenAction, AA, and Version. Key structures: - MarkInfo: parses /MarkInfo dictionary with is_tagged, user_properties, suspects - PageLabelStyle: enum for all label styles (D, R, r, A, a) - PageLabel: single page label with style, prefix, and start value - PageLabelsTree: number tree parser for /PageLabels with /Nums and /Kids support - OcProperties: stub for OCG implementation (delegated to dedicated bead) - Catalog: main catalog struct with all required and optional fields Number tree implementation: - Parses /Nums arrays (leaf nodes with alternating key-value pairs) - Supports /Kids arrays (internal nodes for recursive tree traversal) - Provides get_label_with_start() and get_label() methods for lookup - Correctly formats roman numerals (uppercase/lowercase) and letter sequences All 27 tests pass including proptests for fuzzing robustness (INV-8). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-17 23:45:45 -04:00
..
pdftract-1wqec.md	docs(pdftract-1wqec): verify CI scaffolding acceptance criteria	2026-05-17 07:12:16 -04:00
pdftract-2bsfc.md	feat(pdftract-2bsfc): implement document catalog parser with PageLabels number tree	2026-05-17 23:45:45 -04:00
pdftract-4hn1.md	feat(pdftract-4hn1): use Cow<'static, str> for diagnostic messages	2026-05-17 23:23:38 -04:00
pdftract-4iier.md	docs(pdftract-4iier): add per-profile README documentation for all 9 built-in profiles	2026-05-17 23:19:00 -04:00
pdftract-5z5d8.md	fix(pdftract-5z5d8): fix provenance validation script	2026-05-17 23:43:37 -04:00
pdftract-147a.md	docs(pdftract-147a): author SDK contract specification	2026-05-17 23:13:55 -04:00