Implement per-word validation filter for assisted-OCR BrokenVector path.
Changes:
- Add SpanSource::OcrAssisted variant to hybrid.rs
- Add Span::ocr_assisted() helper method
- Implement validate_ocr_with_position_hints() in ocr.rs
- 5pt distance threshold for position validation
- 0.4 confidence cap for rejected words
- Linear scan for nearest-neighbor lookup
- Add unit tests for validation filter
Closes: pdftract-3s2i
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds test_reproducibility_gate_with_perturbation which verifies that the
reproducibility check correctly detects when classification results differ.
This test intentionally perturbs a confidence value and asserts that the
reproducibility gate fails with a clear diff message.
Acceptance criteria for pdftract-2zw:
- Reproducibility gate fails on intentional perturbation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>