feat(pdftract-2iyk): implement classifier engine

Implements Phase 5.6.2 classifier engine that evaluates document type
profiles against extracted feature signals.

- ClassifierEngine: evaluates profiles, computes normalized scores,
  returns highest-scoring profile above threshold
- FeatureSignals: struct containing all metrics for predicate matching
- ClassificationResult: document_type, confidence, reasons, runner_up
- Score normalization: matched_weight / total_weight to [0, 1]
- Predicate evaluation: all MatchPredicate variants supported
- Regex caching: OnceLock-based cache for TextMatchesRegex
- Unit tests: 28 tests covering invoice, scientific_paper, unknown
  classification, score normalization, tie-breaking, determinism

Closes: pdftract-2iyk
This commit is contained in:
jedarden 2026-05-24 10:23:58 -04:00
parent a049924317
commit 865429d5f6
2 changed files with 1281 additions and 0 deletions

File diff suppressed because it is too large Load diff

View file

@ -17,9 +17,13 @@
//! are the shared vocabulary between the rule engine, built-in profile definitions,
//! and user-authored YAML profiles.
mod engine;
mod loader;
mod types;
pub use engine::{
classify, has_currency_pattern, ClassificationResult, ClassifierEngine, FeatureSignals,
};
pub use loader::{check_forbidden_keys, ForbiddenKeyError, ProfileLoadError};
pub use types::{MatchPredicate, Profile, ProfileType};