- no-mapping.txt: fix garbled unicode to correct 'ABC' output - shape-match.txt: fix from 'Shape' to 'S' (actual PDF content) - Add PROVENANCE.md entries for all 4 encoding fixtures - PDFs remain unchanged (already valid) Fixes ground truth for Level 2-4 Unicode recovery fixtures: - no-mapping.pdf: PDF with no ToUnicode, no standard encoding - agl-only.pdf: PDF with AGL glyph names only - fingerprint-match.pdf: PDF with embedded font for fingerprint matching - shape-match.pdf: PDF with subset font for shape recognition Closes bf-512z1 |
||
|---|---|---|
| .. | ||
| bank_statement | ||
| book_chapter | ||
| contract | ||
| form | ||
| invoice | ||
| legal_filing | ||
| receipt | ||
| scientific_paper | ||
| slide_deck | ||
| PROVENANCE.md | ||