jedarden
|
e96a791dcf
|
feat(pdftract-4y9l): implement hybrid page routing with bbox merge rule
Implement Phase 5.2.4 Hybrid page handling:
- OcrCallback trait for OCR abstraction
- process_hybrid_page() main entry point
- Cell rendering: render once, crop per cell
- Merge rule: IoU > 0.5 + vector_conf >= 0.5 -> vector wins
Tests:
- OCR runs only on scanned cells (48 not 64)
- IoU 0.6 -> vector kept
- IoU 0.3 -> both kept
- IoU 0.6 + low vector conf -> OCR kept
- No duplicate text from overlap
All 40 hybrid tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
2026-05-23 17:48:00 -04:00 |
|