- Add attachments field to ExtractionResult struct - Implement extract_attachments helper function to walk /AF array - Add base64 encoding for attachment content in AttachmentBuilder::into_json - Update result_to_json to include attachments in output - Add PyO3 bindings for attachments with base64 data decoded to bytes - Export AttachmentJson from pdftract-core root - Add base64 dependency to pdftract-core and pdftract-py Per plan 7.5.3: - Attachments > 50 MB are truncated (metadata only, data: null, truncated: true) - Base64 encoding uses RFC 4648 standard alphabet with padding - CLI --text mode excludes attachments (existing behavior maintained) - JSON sink includes attachments array Closes: pdftract-3j2u Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| python/pdftract | ||
| src | ||
| tests | ||
| Cargo.toml | ||
| pdftract-py.cdx.json | ||
| pyproject.toml | ||