pdftract/crates
jedarden b9b4f50ff8 feat(pdftract-2etcd): implement map_confidence_source function
Implement the map_confidence_source(unicode_source: UnicodeSource,
corrected_in_4_7: bool) -> ConfidenceSource function that collapses the
6 internal UnicodeSource variants down to the 3 schema-exposed
ConfidenceSource variants.

- Mapping follows INV-9 stable taxonomy
- Phase 4.7 correction override: corrected Unicode downgrades
  Native -> Heuristic
- OCR is never affected by corrections (corrections apply to vector
  text, not raster OCR output)
- Exhaustive match on UnicodeSource ensures compiler-enforced
  completeness

Acceptance criteria:
- Unit tests for all (UnicodeSource, corrected) combinations PASS
- ToUnicode + corrected=true → Heuristic (override applies)
- Ocr + corrected=true → Ocr (override does NOT apply)
- INV-9 mapping table documented in code comments

Also fixed pre-existing compilation errors in encryption module:
- detection.rs: syntax error in PdfObject::Array construction
- mod.rs: removed duplicate EncryptionInfo struct definition

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 00:46:19 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-2825c): add comparison mode support to inspector frontend 2026-05-27 22:52:15 -04:00
pdftract-core feat(pdftract-2etcd): implement map_confidence_source function 2026-05-28 00:46:19 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py feat(pdftract-1tswa): implement GIL release with py.allow_threads on extraction entry points 2026-05-26 21:23:00 -04:00