feat(pdftract-2iyk): implement classifier engine
Implements Phase 5.6.2 classifier engine that evaluates document type profiles against extracted feature signals. - ClassifierEngine: evaluates profiles, computes normalized scores, returns highest-scoring profile above threshold - FeatureSignals: struct containing all metrics for predicate matching - ClassificationResult: document_type, confidence, reasons, runner_up - Score normalization: matched_weight / total_weight to [0, 1] - Predicate evaluation: all MatchPredicate variants supported - Regex caching: OnceLock-based cache for TextMatchesRegex - Unit tests: 28 tests covering invoice, scientific_paper, unknown classification, score normalization, tie-breaking, determinism Closes: pdftract-2iyk
This commit is contained in:
parent
a049924317
commit
865429d5f6
2 changed files with 1281 additions and 0 deletions
1277
crates/pdftract-core/src/profiles/engine.rs
Normal file
1277
crates/pdftract-core/src/profiles/engine.rs
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -17,9 +17,13 @@
|
|||
//! are the shared vocabulary between the rule engine, built-in profile definitions,
|
||||
//! and user-authored YAML profiles.
|
||||
|
||||
mod engine;
|
||||
mod loader;
|
||||
mod types;
|
||||
|
||||
pub use engine::{
|
||||
classify, has_currency_pattern, ClassificationResult, ClassifierEngine, FeatureSignals,
|
||||
};
|
||||
pub use loader::{check_forbidden_keys, ForbiddenKeyError, ProfileLoadError};
|
||||
pub use types::{MatchPredicate, Profile, ProfileType};
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue