Commit graph

1 commit

Author SHA1 Message Date
jedarden
58e4348289 docs(pdftract-32x4): add verification note for language pack management
Implement OCR language-pack management infrastructure resolving OQ-04.

Components implemented:
- detect_available_languages() - scans tessdata for .traineddata files
- validate_ocr_languages() - validates requested languages, emits diagnostics
- ExtractionOptions.ocr_language field with default vec!["eng"]
- OCR_LANGUAGE_UNAVAILABLE diagnostic code
- Doctor check for language verification
- docs/notes/ocr-language-packs.md with distribution strategy

OQ-04 Resolution: Bundled in Docker images with tiered strategy
- pdftract:ocr (~150 MB) - eng + 13 common languages
- pdftract:full (~600 MB) - All 100+ languages

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 23:59:23 -04:00