- Add worked example to Glyph struct showing all 11 fields
- Add worked example to Span struct showing all 10 fields
- Examples use rust,no_run for internal dependencies
- cargo doc passes with docs.rs feature set
- Verification note added at notes/pdftract-3eohy.md
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implement per-word validation filter for assisted-OCR BrokenVector path.
Changes:
- Add SpanSource::OcrAssisted variant to hybrid.rs
- Add Span::ocr_assisted() helper method
- Implement validate_ocr_with_position_hints() in ocr.rs
- 5pt distance threshold for position validation
- 0.4 confidence cap for rejected words
- Linear scan for nearest-neighbor lookup
- Add unit tests for validation filter
Closes: pdftract-3s2i
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds test_reproducibility_gate_with_perturbation which verifies that the
reproducibility check correctly detects when classification results differ.
This test intentionally perturbs a confidence value and asserts that the
reproducibility gate fails with a clear diff message.
Acceptance criteria for pdftract-2zw:
- Reproducibility gate fails on intentional perturbation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>