jedarden/pdftract

Fork 0

Commit graph

Author	SHA1	Message	Date
jedarden	b535638104	feat(pdftract-2bsfc): implement document catalog parser with PageLabels number tree Implement the document catalog parser (/Root traversal) for PDF documents. The catalog parser extracts all key entries from the document catalog including Pages, Outlines, MarkInfo, StructTreeRoot, AcroForm, Names, Metadata, PageLabels, OCProperties, OpenAction, AA, and Version. Key structures: - MarkInfo: parses /MarkInfo dictionary with is_tagged, user_properties, suspects - PageLabelStyle: enum for all label styles (D, R, r, A, a) - PageLabel: single page label with style, prefix, and start value - PageLabelsTree: number tree parser for /PageLabels with /Nums and /Kids support - OcProperties: stub for OCG implementation (delegated to dedicated bead) - Catalog: main catalog struct with all required and optional fields Number tree implementation: - Parses /Nums arrays (leaf nodes with alternating key-value pairs) - Supports /Kids arrays (internal nodes for recursive tree traversal) - Provides get_label_with_start() and get_label() methods for lookup - Correctly formats roman numerals (uppercase/lowercase) and letter sequences All 27 tests pass including proptests for fuzzing robustness (INV-8). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 23:45:45 -04:00
jedarden	88278c362f	feat(pdftract-4hn1): use Cow<'static, str> for diagnostic messages Changed Diagnostic::msg from String to Cow<'static, str> to avoid allocations for static error messages. Static messages now use Cow::Borrowed, while dynamic formatted messages use Cow::Owned. Also fixed peek_token lifetime issue - was returning reference to local variable, now returns reference from cache. Acceptance criteria: - Token enum with all required variants - Lexer struct with position tracking and diagnostics - Diagnostic uses Cow<'static, str> for zero-allocation static messages - All public methods implemented: new, next_token, peek_token, position, take_diagnostics - All internal helpers implemented Refs: pdftract-4hn1 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Bead-Id: pdftract-4hn1	2026-05-17 23:23:38 -04:00

Author

SHA1

Message

Date

jedarden

b535638104

feat(pdftract-2bsfc): implement document catalog parser with PageLabels number tree

Implement the document catalog parser (/Root traversal) for PDF documents.
The catalog parser extracts all key entries from the document catalog
including Pages, Outlines, MarkInfo, StructTreeRoot, AcroForm, Names,
Metadata, PageLabels, OCProperties, OpenAction, AA, and Version.

Key structures:
- MarkInfo: parses /MarkInfo dictionary with is_tagged, user_properties, suspects
- PageLabelStyle: enum for all label styles (D, R, r, A, a)
- PageLabel: single page label with style, prefix, and start value
- PageLabelsTree: number tree parser for /PageLabels with /Nums and /Kids support
- OcProperties: stub for OCG implementation (delegated to dedicated bead)
- Catalog: main catalog struct with all required and optional fields

Number tree implementation:
- Parses /Nums arrays (leaf nodes with alternating key-value pairs)
- Supports /Kids arrays (internal nodes for recursive tree traversal)
- Provides get_label_with_start() and get_label() methods for lookup
- Correctly formats roman numerals (uppercase/lowercase) and letter sequences

All 27 tests pass including proptests for fuzzing robustness (INV-8).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-17 23:45:45 -04:00

jedarden

88278c362f

feat(pdftract-4hn1): use Cow<'static, str> for diagnostic messages

Changed Diagnostic::msg from String to Cow<'static, str> to avoid
allocations for static error messages. Static messages now use
Cow::Borrowed, while dynamic formatted messages use Cow::Owned.

Also fixed peek_token lifetime issue - was returning reference to
local variable, now returns reference from cache.

Acceptance criteria:
- Token enum with all required variants
- Lexer struct with position tracking and diagnostics
- Diagnostic uses Cow<'static, str> for zero-allocation static messages
- All public methods implemented: new, next_token, peek_token, position, take_diagnostics
- All internal helpers implemented

Refs: pdftract-4hn1

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: pdftract-4hn1

2026-05-17 23:23:38 -04:00

2 commits