pdftract/tests/fixtures
jedarden b115b5a677 fix(bf-512z1): fix encoding fixture ground truth and add provenance
- no-mapping.txt: fix garbled unicode to correct 'ABC' output
- shape-match.txt: fix from 'Shape' to 'S' (actual PDF content)
- Add PROVENANCE.md entries for all 4 encoding fixtures
- PDFs remain unchanged (already valid)

Fixes ground truth for Level 2-4 Unicode recovery fixtures:
- no-mapping.pdf: PDF with no ToUnicode, no standard encoding
- agl-only.pdf: PDF with AGL glyph names only
- fingerprint-match.pdf: PDF with embedded font for fingerprint matching
- shape-match.pdf: PDF with subset font for shape recognition

Closes bf-512z1
2026-06-09 01:13:51 -04:00
..
cjk fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
classifier feat(pdftract-59zz): implement MCP bearer token ingress channels and TH-03 enforcement 2026-05-18 02:47:54 -04:00
encoding fix(bf-512z1): fix encoding fixture ground truth and add provenance 2026-06-09 01:13:51 -04:00
encrypted fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
fonts feat(pdftract-5u8bp): implement SVG clip generator 2026-05-23 03:43:19 -04:00
forms fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
grep-corpus feat(pdftract-5bzpg): implement pdftract-grep-1000 CI benchmark skeleton 2026-05-25 08:53:23 -04:00
json_schema fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
malformed fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
ocr feat(pdftract-48ea): implement BrokenVector fixtures + WER delta CI gate 2026-05-24 10:52:41 -04:00
page_class feat(pdftract-2zw): page classification fixtures + integration tests + reproducibility gate 2026-05-23 15:04:05 -04:00
perf feat(bf-1g1fd): implement CI memory-ceiling gate with cgroup MemoryMax enforcement 2026-05-23 13:22:55 -04:00
preprocess feat(pdftract-27n3): implement border padding, pipeline orchestration, and fixtures 2026-05-23 21:55:11 -04:00
profiles fix(bf-512z1): fix encoding fixture ground truth and add provenance 2026-06-09 01:13:51 -04:00
scanned docs(pdftract-25k4x): add verification note for figure/caption detection 2026-06-01 09:35:02 -04:00
security wip: intermediate state from previous work 2026-05-29 08:25:23 -04:00
vector feat(pdftract-47e42): implement URL fragment routing for shareable links 2026-06-01 08:23:59 -04:00
gen_fixtures feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_ocr_fixtures feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
gen_suspects feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_simple feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_simple.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_simple_local feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_simple_local.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v2.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v3 feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v3.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v4.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v6 feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v6.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v7 feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v7.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v8 feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
gen_suspects_v8.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_book_chapter_fixtures.rs fix(pdftract-2f7oi): fix test fixture compilation bug and verify error handling 2026-05-27 22:12:25 -04:00
generate_cjk_fixtures.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
generate_cjk_fixtures_fixed.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
generate_encoding_fixtures.py fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
generate_encrypted_fixtures.py chore(pdftract-36glh): remove unused JpxDecoder import and add verification note 2026-05-28 05:23:13 -04:00
generate_encrypted_fixtures.rs fix(pdftract-39gey): fix indent trigger to not split drop-cap paragraphs 2026-06-07 13:43:19 -04:00
generate_large_remote_fixture.rs wip: AcroForm improvements, debug tooling, test corpus, and fixture updates 2026-05-30 09:48:14 -04:00
generate_legal_filing_fixtures.rs feat(pdftract-260a3): implement legal_filing profile with fixtures and tests 2026-05-27 21:44:49 -04:00
generate_lzw_fixtures.rs.disabled fix(pdftract-2uk9z): wrap native module results in typed Python objects 2026-05-28 21:18:38 -04:00
generate_lzw_fixtures_main.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
generate_ocr_fixtures.rs feat(pdftract-315s): implement WER CI gate and OCR CLI flags 2026-05-24 02:07:27 -04:00
generate_page_class_fixtures.rs feat(pdftract-2zw): page classification fixtures + integration tests + reproducibility gate 2026-05-23 15:04:05 -04:00
generate_scientific_paper_fixtures.rs feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
generate_slide_deck_fixtures.rs feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
generate_suspects_fixture feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_suspects_fixture.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_suspects_fixtures feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_suspects_fixtures.py feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_suspects_fixtures.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
generate_suspects_fixtures_v5.rs feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
lzw_incremental_early.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_incremental_late.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_incremental_orig.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_mixed_early.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_mixed_late.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_mixed_orig.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_predictor_encoded.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_predictor_orig.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_repeated_early.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_repeated_late.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_repeated_orig.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_simple_early.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_simple_late.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_simple_orig.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
lzw_truncated.bin feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter 2026-05-22 22:38:31 -04:00
PROVENANCE.md fix(bf-512z1): fix encoding fixture ground truth and add provenance 2026-06-09 01:13:51 -04:00
remote_100page.pdf wip: AcroForm improvements, debug tooling, test corpus, and fixture updates 2026-05-30 09:48:14 -04:00
sample.pdf docs(pdftract-145s8): update SDK docs with correct API 2026-05-31 23:43:05 -04:00
tagged-suspects-false.pdf feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
tagged-suspects-true-high-coverage.pdf feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
tagged-suspects-true.pdf feat(pdftract-2w3r): implement StructTree coverage check and XY-cut fallback 2026-05-23 20:53:25 -04:00
test-minimal.pdf feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
valid-minimal.pdf test(pdftract-1eaxm): add distribution templates and C conformance tests 2026-05-23 09:20:22 -04:00