jedarden
377c907898
feat(pdftract-33g): implement PageClassifier engine
...
Implement the PageClassifier engine (Phase 5.1.4) that wires signal
evaluators + Hybrid evaluator together, applies the short-circuit rule,
resolves conflicting signals into a final PageClass and confidence,
and exports the classify_page() entry point.
Changes:
- Add PageContext struct with all classification metrics
- Implement SignalEvaluator trait and 6 signal evaluators
- Implement PageClassifier with short-circuit pipeline
- Fix short-circuit threshold: > 0.95 → >= 0.95
- Fix LowDensitySignal: strength 0.75 → 0.95 for short-circuit
- Fix signal order: LowDensitySignal before HighCharValiditySignal
Acceptance criteria:
- ✅ All four critical-test fixtures classified correctly
- ✅ Edge cases: blank page, image-only page
- ✅ Determinism: BTreeSet + Vec for reproducible output
- ⚠️ Micro-benchmark: requires real fixture suite
All 53 classify module tests pass.
Closes: pdftract-33g
2026-05-23 14:15:52 -04:00