pdftract/crates
jedarden a639794133 feat(pdftract-29gu): implement Phase 5.5.3 region-level confidence policy
- Add OcrFallback variant to SpanSource enum for fallback spans
- Add page_seg_mode field to TessOpts for PSM_SPARSE_TEXT support
- Add ASSISTED_OCR_KEEP_THRESH (0.7) and ASSISTED_OCR_FALLBACK_THRESH (0.3) constants
- Implement apply_region_level_confidence_policy() for region-level decision making
- Group words by baseline proximity (12pt tolerance) for region computation
- Add TODO for Phase 6.1 confidence_source enum to include "ocr-fallback"

Closes: pdftract-29gu
2026-05-24 05:15:46 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-core feat(pdftract-29gu): implement Phase 5.5.3 region-level confidence policy 2026-05-24 05:15:46 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00