Implements the StructTree parser (Phase 7.1.1) with: - Depth-first walker over /StructTreeRoot via /K array - Support for all four /K entry types: StructElem, MCID, MCR, OBJR - /RoleMap resolution with chain handling and cycle detection - /Lang inheritance through the structure tree - /ActualText inheritance (applies to all descendant content) - Public API: StructureType, StructElemNode, StructTreeRoot, RoleMap, Kid Acceptance criteria: - PASS: All four /K element kinds handled without crashing - PASS: /RoleMap chains resolve to standard type or NonStruct - PASS: /Lang and /ActualText inherit correctly down tree - PASS: Unit tests for Word RoleMap (Heading1 -> H1) - PASS: Unit tests for nested /Lang and /ActualText scope - PASS: Public type StructElemNode documented in core crate References: - Plan section 7.1 StructTree Exploitation (lines 2547-2549, 2552-2553) - PDF 1.7 spec 14.7.4 (Structure Tree) and 14.8.4 (Standard Structure Types) Co-Authored-By: Claude Code <noreply@anthropic.com>
40 lines
1.5 KiB
Rust
40 lines
1.5 KiB
Rust
//! PDF parsing primitives.
|
|
//!
|
|
//! This module provides the lexer and object parser for reading PDF documents.
|
|
|
|
pub mod diagnostic;
|
|
pub mod lexer;
|
|
pub mod object;
|
|
pub mod objstm;
|
|
pub mod xref;
|
|
pub mod catalog;
|
|
pub mod stream;
|
|
pub mod secrets;
|
|
pub mod pages;
|
|
pub mod outline;
|
|
pub mod resources;
|
|
pub mod ocg;
|
|
pub mod struct_tree;
|
|
|
|
// Re-export from the unified diagnostics module (Phase 1.6)
|
|
pub use crate::diagnostics::{Diagnostic, Severity, DiagCode, ObjRef};
|
|
pub use object::{PdfObject};
|
|
pub use objstm::{ObjectStmParser, ObjStmCacheEntry, ObjStmResult, ObjStmError};
|
|
pub use xref::{
|
|
XrefResolver, XrefEntry, ResolveError, ResolveResult, XrefSection,
|
|
parse_traditional_xref, parse_xref_stream, merge_hybrid, is_hybrid_trailer,
|
|
LinearizationInfo, detect_linearization, load_xref_linearized, merge_linearized_xrefs,
|
|
load_xref_with_prev_chain,
|
|
};
|
|
pub use catalog::{Catalog, MarkInfo, PageLabel, PageLabelsTree, PageLabelStyle, parse_catalog};
|
|
pub use ocg::{OcProperties, OcGroup, Ocmd, OcmdPolicy, BaseState, parse_oc_properties};
|
|
pub use resources::{ResourceDict, merge_resources, extract_resources};
|
|
pub use pages::{PageDict, flatten_page_tree, DEFAULT_MEDIABOX};
|
|
pub use struct_tree::{
|
|
StructureType, StructElemNode, StructTreeRoot, RoleMap, Kid,
|
|
parse_struct_tree,
|
|
};
|
|
pub use stream::{
|
|
StreamDecoder, FlateDecoder, ASCII85Decoder, ASCIIHexDecoder, CryptDecoder, PassthroughDecoder,
|
|
normalize_filter_name, get_decoder, FilterError, DEFAULT_MAX_DECOMPRESS_BYTES,
|
|
};
|