Commit graph

1 commit

Author SHA1 Message Date
jedarden
df21126d99 docs(bf-2he4t): add verification note for scanned fixtures corpus
Assembled and verified ground-truth corpus for scanned PDF fixtures:
- All 4 fixtures present (receipt, invoice, form, 10-page doc)
- All at 300 DPI with paired ground truth transcripts
- Files verified present and valid
- WER verification blocked by pdftract compilation errors
- Baseline Tesseract testing shows high WER due to layout handling limitations

Corpus is complete; WER <3% verification pending pdftract build fixes.
2026-06-01 09:25:53 -04:00