pdftract/crates
jedarden 377c907898 feat(pdftract-33g): implement PageClassifier engine
Implement the PageClassifier engine (Phase 5.1.4) that wires signal
evaluators + Hybrid evaluator together, applies the short-circuit rule,
resolves conflicting signals into a final PageClass and confidence,
and exports the classify_page() entry point.

Changes:
- Add PageContext struct with all classification metrics
- Implement SignalEvaluator trait and 6 signal evaluators
- Implement PageClassifier with short-circuit pipeline
- Fix short-circuit threshold: > 0.95 → >= 0.95
- Fix LowDensitySignal: strength 0.75 → 0.95 for short-circuit
- Fix signal order: LowDensitySignal before HighCharValiditySignal

Acceptance criteria:
-  All four critical-test fixtures classified correctly
-  Edge cases: blank page, image-only page
-  Determinism: BTreeSet + Vec for reproducible output
- ⚠️  Micro-benchmark: requires real fixture suite

All 53 classify module tests pass.

Closes: pdftract-33g
2026-05-23 14:15:52 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-4sky1): implement doctor exit code policy 2026-05-23 07:27:09 -04:00
pdftract-core feat(pdftract-33g): implement PageClassifier engine 2026-05-23 14:15:52 -04:00
pdftract-libpdftract feat(pdftract-juc): implement Standard 14 font metrics registry 2026-05-23 14:04:02 -04:00
pdftract-py docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00