pdftract/tests/fixtures/PROVENANCE.md
jedarden 4dddd81bcd docs(pdftract-5o3zv): verify footnotes, inline links, and page breaks implementation
Phase 6.5.5 functionality already implemented and tested:
- Footnote emission infrastructure (PageFootnotes, emit_footnote_ref/def)
- Inline link emission (emit_page_links_from_json, emit_inline_link)
- Page breaks (--md-no-page-breaks CLI flag, MarkdownOptions)

All acceptance criteria tests pass. Ready for Phase 7 integration.

Also adds missing provenance entry for json_schema/simple-text.pdf fixture.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 16:00:12 -04:00

197 lines
8.2 KiB
Markdown

# EC-04-rc4-encrypted.pdf
Generated by tests/fixtures/generate_encrypted_fixtures.py
PDF 1.7, RC4 encryption (V=1, R=2), 40-bit key, user password: "user40"
Generated: 2026-05-28
# EC-05-aes128-encrypted.pdf
Generated by tests/fixtures/generate_encrypted_fixtures.py
PDF 1.7, AES-128 encryption (V=2, R=3), 128-bit key, user password: "user128"
Generated: 2026-05-28
# EC-06-aes256-encrypted.pdf
Generated by tests/fixtures/generate_encrypted_fixtures.py
PDF 2.0, AES-256 encryption (V=5, R=5), 256-bit key, user password: "user256"
Generated: 2026-05-28
# EC-empty-password.pdf
Generated by tests/fixtures/generate_encrypted_fixtures.py
PDF 1.7, no encryption (control fixture)
Generated: 2026-05-28
# EC-04-rc4-encrypted.pdf
Generated by tests/fixtures/generate_encrypted_fixtures.py
PDF 1.7, RC4 encryption (V=1, R=2), 40-bit key, user password: "user40"
Generated: 2026-05-28
# EC-05-aes128-encrypted.pdf
Generated by tests/fixtures/generate_encrypted_fixtures.py
PDF 1.7, AES-128 encryption (V=2, R=3), 128-bit key, user password: "user128"
Generated: 2026-05-28
# EC-06-aes256-encrypted.pdf
Generated by tests/fixtures/generate_encrypted_fixtures.py
PDF 2.0, AES-256 encryption (V=5, R=5), 256-bit key, user password: "user256"
Generated: 2026-05-28
# EC-empty-password.pdf
Generated by tests/fixtures/generate_encrypted_fixtures.py
PDF 1.7, no encryption (control fixture)
Generated: 2026-05-28
# sample.pdf
Copied from valid-minimal.pdf for SDK examples default path
Minimal valid PDF v1.4 fixture for contract method examples
Generated: 2026-05-31
# json_schema/simple_invoice.pdf
Simple invoice PDF for JSON schema validation tests
Generated: 2026-06-01
# json_schema/EC-04-rc4-encrypted.pdf
Copied from fixtures/EC-04-rc4-encrypted.pdf for JSON schema validation
PDF 1.7, RC4 encryption (V=1, R=2), 40-bit key, user password: "user40"
Generated: 2026-06-01
# json_schema/EC-05-aes128-encrypted.pdf
Copied from fixtures/EC-05-aes128-encrypted.pdf for JSON schema validation
PDF 1.7, AES-128 encryption (V=2, R=3), 128-bit key, user password: "user128"
Generated: 2026-06-01
# json_schema/valid-minimal.pdf
Minimal valid PDF v1.4 fixture for JSON schema validation tests
Generated: 2026-05-28
# json_schema/sample.pdf
Copied from valid-minimal.pdf for SDK examples default path
Minimal valid PDF v1.4 fixture for contract method examples
Generated: 2026-05-31
# vector/academic-paper/source.pdf
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
Academic paper on machine learning - Abstract, Introduction, Methods, Results, Conclusion
Generated: 2026-06-01
# vector/technical-documentation/source.pdf
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
API documentation with Getting Started, Authentication, Endpoints, Rate Limits
Generated: 2026-06-01
# vector/legal-contract/source.pdf
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
Service Agreement with Services, Term, Compensation, Confidentiality, Termination, Governing Law
Generated: 2026-06-01
# vector/scientific-report/source.pdf
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
Climate Research Report with Executive Summary, Data Collection, Analysis, Findings, Recommendations
Generated: 2026-06-01
# vector/user-manual/source.pdf
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
Product User Manual with Quick Start Guide, Unboxing, Setup, Features, Troubleshooting, Support
Generated: 2026-06-01
# vector/financial-report/source.pdf
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
Q1 Financial Report with Revenue, Expenses, Net Income, Outlook, Risk Factors
Generated: 2026-06-01
# vector/conference-proceedings/source.pdf
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
Conference Proceedings with Keynote Address, Paper Session, Panel Discussion, Workshop
Generated: 2026-06-01
# vector/medical-research/source.pdf
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
Clinical Trial Results with Background, Methodology, Results, Discussion, Conclusion
Generated: 2026-06-01
# vector/multi-page-academic/source.pdf
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
Multi-page academic paper (3 pages) - Abstract, Introduction, Conclusion
Generated: 2026-06-01
# vector/code-documentation/source.pdf
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
Code library documentation with Installation, Quick Example, API Reference, Supported Formats, Limitations, License
Generated: 2026-06-01
# scanned/receipt/receipt-300dpi.pdf
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
Source PDF for scan simulation at 300 DPI
Supermarket receipt with items, prices, totals (Helvetica 10pt, Letter, 14pt line spacing)
Generated: 2026-06-01
# scanned/receipt/receipt-300dpi-scanned.pdf
Generated by pdftoppm + img2pdf from receipt-300dpi.pdf at 300 DPI
Scan simulation for OCR testing (rasterized image-only PDF)
Generated: 2026-06-01
# scanned/documents/invoice-300dpi.pdf
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
Source PDF for scan simulation at 300 DPI
Service invoice with line items, totals, payment terms (Helvetica 11pt, Letter, 16pt line spacing)
Generated: 2026-06-01
# scanned/documents/invoice-300dpi-scanned.pdf
Generated by pdftoppm + img2pdf from invoice-300dpi.pdf at 300 DPI
Scan simulation for OCR testing (rasterized image-only PDF)
Generated: 2026-06-01
# scanned/documents/form-300dpi.pdf
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
Source PDF for scan simulation at 300 DPI
Employment application form with fields and checkboxes (Helvetica 11pt, Letter, 18pt line spacing)
Generated: 2026-06-01
# scanned/documents/form-300dpi-scanned.pdf
Generated by pdftoppm + img2pdf from form-300dpi.pdf at 300 DPI
Scan simulation for OCR testing (rasterized image-only PDF)
Generated: 2026-06-01
# scanned/multi-page/doc-10page-300dpi.pdf
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
Source PDF for scan simulation at 300 DPI (10 pages with diverse content)
Times-Roman 12pt, Letter, 18pt line spacing, "Page N:" markers
Generated: 2026-06-01
# scanned/multi-page/doc-10page-300dpi-scanned.pdf
Generated by pdftoppm + img2pdf from doc-10page-300dpi.pdf at 300 DPI
Scan simulation for OCR testing (rasterized image-only PDF, 10 pages)
Generated: 2026-06-01
# scanned/receipt/receipt-300dpi.pdf
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
Source PDF for scan simulation at 300 DPI
Simple sales receipt with itemized list and totals (Helvetica 11pt, 6.5" x 4", 14pt line spacing)
Generated: 2026-06-01
# scanned/receipt/receipt-300dpi-scanned.pdf
Generated by pdftoppm + img2pdf from receipt-300dpi.pdf at 300 DPI
Scan simulation for OCR testing (rasterized image-only PDF)
Generated: 2026-06-01
# scanned/documents/invoice-300dpi.pdf
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
Source PDF for scan simulation at 300 DPI
Business invoice with line items, subtotal, tax, and total (Helvetica 11pt, Letter, 16pt line spacing)
Generated: 2026-06-01
# scanned/documents/invoice-300dpi-scanned.pdf
Generated by pdftoppm + img2pdf from invoice-300dpi.pdf at 300 DPI
Scan simulation for OCR testing (rasterized image-only PDF)
# json_schema/simple-text.pdf
Minimal text-only PDF for JSON schema validation tests
Generated: 2026-06-01