pdftract/.needle-predispatch-sha
jedarden e96a791dcf feat(pdftract-4y9l): implement hybrid page routing with bbox merge rule
Implement Phase 5.2.4 Hybrid page handling:
- OcrCallback trait for OCR abstraction
- process_hybrid_page() main entry point
- Cell rendering: render once, crop per cell
- Merge rule: IoU > 0.5 + vector_conf >= 0.5 -> vector wins

Tests:
- OCR runs only on scanned cells (48 not 64)
- IoU 0.6 -> vector kept
- IoU 0.3 -> both kept
- IoU 0.6 + low vector conf -> OCR kept
- No duplicate text from overlap

All 40 hybrid tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 17:48:00 -04:00

1 line
41 B
Text

e3a149fbf8f56a4e05881a92d45663b9c9bd3878