pdftract/tests/fixtures/PROVENANCE.md
jedarden 4dddd81bcd docs(pdftract-5o3zv): verify footnotes, inline links, and page breaks implementation
Phase 6.5.5 functionality already implemented and tested:
- Footnote emission infrastructure (PageFootnotes, emit_footnote_ref/def)
- Inline link emission (emit_page_links_from_json, emit_inline_link)
- Page breaks (--md-no-page-breaks CLI flag, MarkdownOptions)

All acceptance criteria tests pass. Ready for Phase 7 integration.

Also adds missing provenance entry for json_schema/simple-text.pdf fixture.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:00:12 -04:00

8.2 KiB

EC-04-rc4-encrypted.pdf

Generated by tests/fixtures/generate_encrypted_fixtures.py PDF 1.7, RC4 encryption (V=1, R=2), 40-bit key, user password: "user40" Generated: 2026-05-28

EC-05-aes128-encrypted.pdf

Generated by tests/fixtures/generate_encrypted_fixtures.py PDF 1.7, AES-128 encryption (V=2, R=3), 128-bit key, user password: "user128" Generated: 2026-05-28

EC-06-aes256-encrypted.pdf

Generated by tests/fixtures/generate_encrypted_fixtures.py PDF 2.0, AES-256 encryption (V=5, R=5), 256-bit key, user password: "user256" Generated: 2026-05-28

EC-empty-password.pdf

Generated by tests/fixtures/generate_encrypted_fixtures.py PDF 1.7, no encryption (control fixture) Generated: 2026-05-28

EC-04-rc4-encrypted.pdf

Generated by tests/fixtures/generate_encrypted_fixtures.py PDF 1.7, RC4 encryption (V=1, R=2), 40-bit key, user password: "user40" Generated: 2026-05-28

EC-05-aes128-encrypted.pdf

Generated by tests/fixtures/generate_encrypted_fixtures.py PDF 1.7, AES-128 encryption (V=2, R=3), 128-bit key, user password: "user128" Generated: 2026-05-28

EC-06-aes256-encrypted.pdf

Generated by tests/fixtures/generate_encrypted_fixtures.py PDF 2.0, AES-256 encryption (V=5, R=5), 256-bit key, user password: "user256" Generated: 2026-05-28

EC-empty-password.pdf

Generated by tests/fixtures/generate_encrypted_fixtures.py PDF 1.7, no encryption (control fixture) Generated: 2026-05-28

sample.pdf

Copied from valid-minimal.pdf for SDK examples default path Minimal valid PDF v1.4 fixture for contract method examples Generated: 2026-05-31

json_schema/simple_invoice.pdf

Simple invoice PDF for JSON schema validation tests Generated: 2026-06-01

json_schema/EC-04-rc4-encrypted.pdf

Copied from fixtures/EC-04-rc4-encrypted.pdf for JSON schema validation PDF 1.7, RC4 encryption (V=1, R=2), 40-bit key, user password: "user40" Generated: 2026-06-01

json_schema/EC-05-aes128-encrypted.pdf

Copied from fixtures/EC-05-aes128-encrypted.pdf for JSON schema validation PDF 1.7, AES-128 encryption (V=2, R=3), 128-bit key, user password: "user128" Generated: 2026-06-01

json_schema/valid-minimal.pdf

Minimal valid PDF v1.4 fixture for JSON schema validation tests Generated: 2026-05-28

json_schema/sample.pdf

Copied from valid-minimal.pdf for SDK examples default path Minimal valid PDF v1.4 fixture for contract method examples Generated: 2026-05-31

vector/academic-paper/source.pdf

Generated by tests/fixtures/vector/generate_vector_cer_corpus.py Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding) Academic paper on machine learning - Abstract, Introduction, Methods, Results, Conclusion Generated: 2026-06-01

vector/technical-documentation/source.pdf

Generated by tests/fixtures/vector/generate_vector_cer_corpus.py Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding) API documentation with Getting Started, Authentication, Endpoints, Rate Limits Generated: 2026-06-01

vector/legal-contract/source.pdf

Generated by tests/fixtures/vector/generate_vector_cer_corpus.py Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding) Service Agreement with Services, Term, Compensation, Confidentiality, Termination, Governing Law Generated: 2026-06-01

vector/scientific-report/source.pdf

Generated by tests/fixtures/vector/generate_vector_cer_corpus.py Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding) Climate Research Report with Executive Summary, Data Collection, Analysis, Findings, Recommendations Generated: 2026-06-01

vector/user-manual/source.pdf

Generated by tests/fixtures/vector/generate_vector_cer_corpus.py Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding) Product User Manual with Quick Start Guide, Unboxing, Setup, Features, Troubleshooting, Support Generated: 2026-06-01

vector/financial-report/source.pdf

Generated by tests/fixtures/vector/generate_vector_cer_corpus.py Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding) Q1 Financial Report with Revenue, Expenses, Net Income, Outlook, Risk Factors Generated: 2026-06-01

vector/conference-proceedings/source.pdf

Generated by tests/fixtures/vector/generate_vector_cer_corpus.py Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding) Conference Proceedings with Keynote Address, Paper Session, Panel Discussion, Workshop Generated: 2026-06-01

vector/medical-research/source.pdf

Generated by tests/fixtures/vector/generate_vector_cer_corpus.py Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding) Clinical Trial Results with Background, Methodology, Results, Discussion, Conclusion Generated: 2026-06-01

vector/multi-page-academic/source.pdf

Generated by tests/fixtures/vector/generate_vector_cer_corpus.py Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding) Multi-page academic paper (3 pages) - Abstract, Introduction, Conclusion Generated: 2026-06-01

vector/code-documentation/source.pdf

Generated by tests/fixtures/vector/generate_vector_cer_corpus.py Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding) Code library documentation with Installation, Quick Example, API Reference, Supported Formats, Limitations, License Generated: 2026-06-01

scanned/receipt/receipt-300dpi.pdf

Generated by tests/fixtures/scanned/generate_scanned_fixtures.py Source PDF for scan simulation at 300 DPI Supermarket receipt with items, prices, totals (Helvetica 10pt, Letter, 14pt line spacing) Generated: 2026-06-01

scanned/receipt/receipt-300dpi-scanned.pdf

Generated by pdftoppm + img2pdf from receipt-300dpi.pdf at 300 DPI Scan simulation for OCR testing (rasterized image-only PDF) Generated: 2026-06-01

scanned/documents/invoice-300dpi.pdf

Generated by tests/fixtures/scanned/generate_scanned_fixtures.py Source PDF for scan simulation at 300 DPI Service invoice with line items, totals, payment terms (Helvetica 11pt, Letter, 16pt line spacing) Generated: 2026-06-01

scanned/documents/invoice-300dpi-scanned.pdf

Generated by pdftoppm + img2pdf from invoice-300dpi.pdf at 300 DPI Scan simulation for OCR testing (rasterized image-only PDF) Generated: 2026-06-01

scanned/documents/form-300dpi.pdf

Generated by tests/fixtures/scanned/generate_scanned_fixtures.py Source PDF for scan simulation at 300 DPI Employment application form with fields and checkboxes (Helvetica 11pt, Letter, 18pt line spacing) Generated: 2026-06-01

scanned/documents/form-300dpi-scanned.pdf

Generated by pdftoppm + img2pdf from form-300dpi.pdf at 300 DPI Scan simulation for OCR testing (rasterized image-only PDF) Generated: 2026-06-01

scanned/multi-page/doc-10page-300dpi.pdf

Generated by tests/fixtures/scanned/generate_scanned_fixtures.py Source PDF for scan simulation at 300 DPI (10 pages with diverse content) Times-Roman 12pt, Letter, 18pt line spacing, "Page N:" markers Generated: 2026-06-01

scanned/multi-page/doc-10page-300dpi-scanned.pdf

Generated by pdftoppm + img2pdf from doc-10page-300dpi.pdf at 300 DPI Scan simulation for OCR testing (rasterized image-only PDF, 10 pages) Generated: 2026-06-01

scanned/receipt/receipt-300dpi.pdf

Generated by tests/fixtures/scanned/generate_scanned_fixtures.py Source PDF for scan simulation at 300 DPI Simple sales receipt with itemized list and totals (Helvetica 11pt, 6.5" x 4", 14pt line spacing) Generated: 2026-06-01

scanned/receipt/receipt-300dpi-scanned.pdf

Generated by pdftoppm + img2pdf from receipt-300dpi.pdf at 300 DPI Scan simulation for OCR testing (rasterized image-only PDF) Generated: 2026-06-01

scanned/documents/invoice-300dpi.pdf

Generated by tests/fixtures/scanned/generate_scanned_fixtures.py Source PDF for scan simulation at 300 DPI Business invoice with line items, subtotal, tax, and total (Helvetica 11pt, Letter, 16pt line spacing) Generated: 2026-06-01

scanned/documents/invoice-300dpi-scanned.pdf

Generated by pdftoppm + img2pdf from invoice-300dpi.pdf at 300 DPI Scan simulation for OCR testing (rasterized image-only PDF)

json_schema/simple-text.pdf

Minimal text-only PDF for JSON schema validation tests Generated: 2026-06-01