Implement the Receipt struct and lite-mode JSON serialization for visual citation receipts. This provides cryptographic proof of provenance for extracted text. Changes: - Add Receipt struct with 6 fields (pdf_fingerprint, page_index, bbox, content_hash, extraction_version, svg_clip) - Implement Receipt::lite() constructor with NFC normalization - Integrate Receipt into SpanJson and BlockJson schemas - Add unicode-normalization and serde_json dependencies Acceptance criteria: - Receipt::lite() produces valid receipts with svg_clip=None - Lite mode JSON omits svg_clip key via skip_serializing_if - Content hash uses NFC normalization for cross-platform stability - Receipt wired into SpanJson and BlockJson types Note: 100 receipts aggregate size is ~27 KB (not 15 KB as planned). The 15 KB target is not achievable with required field sizes. Refs: pdftract-5zm86, Phase 6.8 Visual Citation Receipts (lines 2351-2417) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
11 lines
340 B
Rust
11 lines
340 B
Rust
//! pdftract-core — Core PDF parsing and text extraction primitives.
|
|
//!
|
|
//! This crate provides the foundational data structures and parsers for
|
|
//! processing PDF documents, including the lexer, object parser, and
|
|
//! text extraction engines.
|
|
|
|
pub mod diagnostics;
|
|
pub mod fingerprint;
|
|
pub mod parser;
|
|
pub mod receipts;
|
|
pub mod schema;
|