pdftract/crates
jedarden bd91f7d842 feat(pdftract-3lir): implement Filespec dict + EF stream decoder
Implements 7.5.2: Filespec dictionary and EF stream decoder for PDF
embedded file attachments. Extracts filename (/UF preferred over /F),
description, MIME type, size, dates, and MD5 checksum from Filespec
dictionaries and decodes the embedded stream data.

Key additions:
- AttachmentBuilder struct with all attachment metadata fields
- extract_one() function for resolving Filespec and decoding EF stream
- PDF string decoding (UTF-16BE BOM, UTF-16BE without BOM, PDFDocEncoding)
- PDF date to ISO 8601 parsing (reused from signature module)
- 50 MB size limit enforcement with truncation flag
- Support for all Phase 1 stream filters (FlateDecode, LZWDecode, etc.)

Closes: pdftract-3lir
2026-05-24 13:54:27 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-64p5): implement classify CLI subcommand structure 2026-05-24 13:45:44 -04:00
pdftract-core feat(pdftract-3lir): implement Filespec dict + EF stream decoder 2026-05-24 13:54:27 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py feat(pdftract-2nu0s): implement Python SDK contract conformance 2026-05-24 08:55:11 -04:00