pdftract/crates
jedarden 9a3e4ce514 feat(pdftract-axcri): record inline images as ImageXObject entries
Add structures and functions to record inline images (BI/ID/EI sequences)
as ImageXObject entries in a page's image list. This enables Phase 4.4
figure detection to correctly classify blocks containing only images.

Changes:
- Add InlineImageHeader struct for inline image metadata
- Add ImageBytesRef enum for image byte references
- Add ImageXObject struct unifying XObject and inline images
- Add collect_image_xobjects() to collect all images with bboxes
- Add parse_inline_image() to parse BI/ID/EI sequences
- Add compute_unit_square_bbox() for bbox computation from CTM
- Add comprehensive unit tests for all acceptance criteria

Acceptance criteria:
- Inline image with no CTM: bbox == [0,0,1,1] 
- Inline image with CTM 100 0 0 50 200 300: bbox == [200,300,300,350] 
- Page with 3 images: page_image_list has 3 entries with correct bboxes 
- Image mask: recorded with is_mask flag 
- Rotation normalization: handled via CTM 

Closes: pdftract-axcri
2026-05-24 07:41:50 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-dtpwa): implement contract profile per Phase 7.10 schema 2026-05-24 07:10:32 -04:00
pdftract-core feat(pdftract-axcri): record inline images as ImageXObject entries 2026-05-24 07:41:50 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py feat(pdftract-bnba5): implement PyO3 extract_stream entry point with StreamIterator 2026-05-24 07:35:03 -04:00