pdftract/crates
jedarden 2018d684ce feat(pdftract-22p): implement signal evaluators for page classification
Implement five signal evaluators that feed PageClassifier::classify:
- text_operator_presence: 0 text ops + has images -> Scanned 0.95
- all_tr3_with_full_page_image: all Tr=3 + image >= 95% -> BrokenVector 0.99 (EC-12)
- image_coverage_fraction > 0.85 -> Scanned 0.85
- char_validity_rate < 0.4 -> BrokenVector 0.80
- char_validity_rate > 0.85 -> Vector 0.90
- char_density_ratio < 0.03 chars/in^2 -> Scanned 0.65

All thresholds centralized in SignalsConfig struct.
PageContext includes all required fields for evaluation.
Short-circuit classification at strength >= 0.95.
Comprehensive unit tests for each evaluator.

Closes: pdftract-22p
2026-05-31 23:56:17 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-3mdb7): fix tooltip implementation with correct selectors and events 2026-05-31 23:56:17 -04:00
pdftract-core feat(pdftract-22p): implement signal evaluators for page classification 2026-05-31 23:56:17 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py fix(pdftract-2uk9z): wrap native module results in typed Python objects 2026-05-28 21:18:38 -04:00