jedarden
|
7c5206f08e
|
feat(pdftract-347): implement hybrid grid-cell evaluator
Add 8x8 grid decomposition for mixed-content page detection.
Implements Phase 5.1.3 hybrid detection:
- GridClassifier: 8x8 grid (64 cells) per page
- Cell classification: vector (text+validity), scanned (image,no-text), mixed
- Hybrid trigger: >=10 vector cells AND >=10 scanned cells (>=15% each)
- Returns scanned cell indexes for downstream OCR-only-on-cells routing
Acceptance criteria:
- PASS: Critical test (text header + scanned body) -> Hybrid with correct cells
- PASS: Below threshold (9+9 cells) -> NOT Hybrid
- PASS: Determinism (BTreeSet for stable serialization)
- PASS: Cells exposed for Phase 5.2 OCR routing
Refs: bead pdftract-347, plan line 1838
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
2026-05-23 13:49:14 -04:00 |
|