pdftract/crates
jedarden 865429d5f6 feat(pdftract-2iyk): implement classifier engine
Implements Phase 5.6.2 classifier engine that evaluates document type
profiles against extracted feature signals.

- ClassifierEngine: evaluates profiles, computes normalized scores,
  returns highest-scoring profile above threshold
- FeatureSignals: struct containing all metrics for predicate matching
- ClassificationResult: document_type, confidence, reasons, runner_up
- Score normalization: matched_weight / total_weight to [0, 1]
- Predicate evaluation: all MatchPredicate variants supported
- Regex caching: OnceLock-based cache for TextMatchesRegex
- Unit tests: 28 tests covering invoice, scientific_paper, unknown
  classification, score normalization, tie-breaking, determinism

Closes: pdftract-2iyk
2026-05-24 10:23:58 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-1s2uj): add xref test fixture corpus and integration test runner 2026-05-24 08:20:04 -04:00
pdftract-core feat(pdftract-2iyk): implement classifier engine 2026-05-24 10:23:58 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py feat(pdftract-2nu0s): implement Python SDK contract conformance 2026-05-24 08:55:11 -04:00