pdftract/crates
jedarden 51cb277535 feat(pdftract-49cn): implement feature signal extraction for classifier
Implements Phase 5.6.3: FeatureSignals extraction computed during Phase 4 assembly.

- Added profiles/signals.rs module with PageSignalAccumulator and extract_feature_signals()
- Predefined text patterns: currency symbols, ISO dates, INVOICE, WHEREAS, Abstract, References, page numbers, bullets, math operators
- Per-page signal extraction: text content, fonts, table count, heading depth, glyph density
- Document-level aggregation: page count, font diversity, presence flags (signature field, form field, math operators, bullet lists, footer page numbers)
- All regex patterns compiled once via OnceLock for performance
- 23 unit tests covering all functionality

Closes: pdftract-49cn
2026-05-24 11:01:18 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-1s2uj): add xref test fixture corpus and integration test runner 2026-05-24 08:20:04 -04:00
pdftract-core feat(pdftract-49cn): implement feature signal extraction for classifier 2026-05-24 11:01:18 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py feat(pdftract-2nu0s): implement Python SDK contract conformance 2026-05-24 08:55:11 -04:00