pdftract/crates/pdftract-py
jedarden bf9a19f652 feat(pdftract-3j2u): implement 50 MB size limit + base64 encoding for attachments
- Add attachments field to ExtractionResult struct
- Implement extract_attachments helper function to walk /AF array
- Add base64 encoding for attachment content in AttachmentBuilder::into_json
- Update result_to_json to include attachments in output
- Add PyO3 bindings for attachments with base64 data decoded to bytes
- Export AttachmentJson from pdftract-core root
- Add base64 dependency to pdftract-core and pdftract-py

Per plan 7.5.3:
- Attachments > 50 MB are truncated (metadata only, data: null, truncated: true)
- Base64 encoding uses RFC 4648 standard alphabet with padding
- CLI --text mode excludes attachments (existing behavior maintained)
- JSON sink includes attachments array

Closes: pdftract-3j2u

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 11:42:28 -04:00
..
python/pdftract feat(pdftract-2nu0s): implement Python SDK contract conformance 2026-05-24 08:55:11 -04:00
src feat(pdftract-3j2u): implement 50 MB size limit + base64 encoding for attachments 2026-05-25 11:42:28 -04:00
tests feat(pdftract-2nu0s): implement Python SDK contract conformance 2026-05-24 08:55:11 -04:00
Cargo.toml feat(pdftract-3j2u): implement 50 MB size limit + base64 encoding for attachments 2026-05-25 11:42:28 -04:00
pdftract-py.cdx.json feat(pdftract-67tm8): implement MCP stdio transport with integration tests 2026-05-23 00:16:42 -04:00
pyproject.toml feat(pdftract-2nu0s): implement Python SDK contract conformance 2026-05-24 08:55:11 -04:00