pdftract/crates
jedarden 450e2f2df5 feat(pdftract-5u7h): implement Phase 3 position-hint mode
Add ProcessingMode enum and process_with_mode function to Phase 3
content stream processor:

- ProcessingMode::Normal: Extract text with full Unicode resolution
- ProcessingMode::PositionHint: Emit U+FFFD with confidence=0.0, but
  compute bboxes correctly for use by 5.5.2 validation filter

PositionHint mode skips ToUnicode CMap lookup, making it ~10% faster
than Normal mode. The text matrix advances identically in both modes.

Unit tests verify:
- Same input PDF, Normal vs PositionHint -> bboxes identical, Unicode differs
- All PositionHint glyphs have unicode=U+FFFD and confidence=0.0
- Text positioning operators (Tm, Td, TD, T*) work correctly

Closes: pdftract-5u7h
2026-05-24 04:49:36 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-kdp6): implement profile loader secret key hardening 2026-05-24 04:41:04 -04:00
pdftract-core feat(pdftract-5u7h): implement Phase 3 position-hint mode 2026-05-24 04:49:36 -04:00
pdftract-libpdftract feat(pdftract-juc): implement Standard 14 font metrics registry 2026-05-23 14:04:02 -04:00
pdftract-py docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00