Phase 6.5.5 functionality already implemented and tested: - Footnote emission infrastructure (PageFootnotes, emit_footnote_ref/def) - Inline link emission (emit_page_links_from_json, emit_inline_link) - Page breaks (--md-no-page-breaks CLI flag, MarkdownOptions) All acceptance criteria tests pass. Ready for Phase 7 integration. Also adds missing provenance entry for json_schema/simple-text.pdf fixture. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
197 lines
8.2 KiB
Markdown
197 lines
8.2 KiB
Markdown
# EC-04-rc4-encrypted.pdf
|
|
Generated by tests/fixtures/generate_encrypted_fixtures.py
|
|
PDF 1.7, RC4 encryption (V=1, R=2), 40-bit key, user password: "user40"
|
|
Generated: 2026-05-28
|
|
|
|
# EC-05-aes128-encrypted.pdf
|
|
Generated by tests/fixtures/generate_encrypted_fixtures.py
|
|
PDF 1.7, AES-128 encryption (V=2, R=3), 128-bit key, user password: "user128"
|
|
Generated: 2026-05-28
|
|
|
|
# EC-06-aes256-encrypted.pdf
|
|
Generated by tests/fixtures/generate_encrypted_fixtures.py
|
|
PDF 2.0, AES-256 encryption (V=5, R=5), 256-bit key, user password: "user256"
|
|
Generated: 2026-05-28
|
|
|
|
# EC-empty-password.pdf
|
|
Generated by tests/fixtures/generate_encrypted_fixtures.py
|
|
PDF 1.7, no encryption (control fixture)
|
|
Generated: 2026-05-28
|
|
|
|
# EC-04-rc4-encrypted.pdf
|
|
Generated by tests/fixtures/generate_encrypted_fixtures.py
|
|
PDF 1.7, RC4 encryption (V=1, R=2), 40-bit key, user password: "user40"
|
|
Generated: 2026-05-28
|
|
|
|
# EC-05-aes128-encrypted.pdf
|
|
Generated by tests/fixtures/generate_encrypted_fixtures.py
|
|
PDF 1.7, AES-128 encryption (V=2, R=3), 128-bit key, user password: "user128"
|
|
Generated: 2026-05-28
|
|
|
|
# EC-06-aes256-encrypted.pdf
|
|
Generated by tests/fixtures/generate_encrypted_fixtures.py
|
|
PDF 2.0, AES-256 encryption (V=5, R=5), 256-bit key, user password: "user256"
|
|
Generated: 2026-05-28
|
|
|
|
# EC-empty-password.pdf
|
|
Generated by tests/fixtures/generate_encrypted_fixtures.py
|
|
PDF 1.7, no encryption (control fixture)
|
|
Generated: 2026-05-28
|
|
|
|
|
|
# sample.pdf
|
|
Copied from valid-minimal.pdf for SDK examples default path
|
|
Minimal valid PDF v1.4 fixture for contract method examples
|
|
Generated: 2026-05-31
|
|
|
|
# json_schema/simple_invoice.pdf
|
|
Simple invoice PDF for JSON schema validation tests
|
|
Generated: 2026-06-01
|
|
|
|
# json_schema/EC-04-rc4-encrypted.pdf
|
|
Copied from fixtures/EC-04-rc4-encrypted.pdf for JSON schema validation
|
|
PDF 1.7, RC4 encryption (V=1, R=2), 40-bit key, user password: "user40"
|
|
Generated: 2026-06-01
|
|
|
|
# json_schema/EC-05-aes128-encrypted.pdf
|
|
Copied from fixtures/EC-05-aes128-encrypted.pdf for JSON schema validation
|
|
PDF 1.7, AES-128 encryption (V=2, R=3), 128-bit key, user password: "user128"
|
|
Generated: 2026-06-01
|
|
|
|
# json_schema/valid-minimal.pdf
|
|
Minimal valid PDF v1.4 fixture for JSON schema validation tests
|
|
Generated: 2026-05-28
|
|
|
|
# json_schema/sample.pdf
|
|
Copied from valid-minimal.pdf for SDK examples default path
|
|
Minimal valid PDF v1.4 fixture for contract method examples
|
|
Generated: 2026-05-31
|
|
|
|
# vector/academic-paper/source.pdf
|
|
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
|
|
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
|
|
Academic paper on machine learning - Abstract, Introduction, Methods, Results, Conclusion
|
|
Generated: 2026-06-01
|
|
|
|
# vector/technical-documentation/source.pdf
|
|
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
|
|
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
|
|
API documentation with Getting Started, Authentication, Endpoints, Rate Limits
|
|
Generated: 2026-06-01
|
|
|
|
# vector/legal-contract/source.pdf
|
|
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
|
|
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
|
|
Service Agreement with Services, Term, Compensation, Confidentiality, Termination, Governing Law
|
|
Generated: 2026-06-01
|
|
|
|
# vector/scientific-report/source.pdf
|
|
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
|
|
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
|
|
Climate Research Report with Executive Summary, Data Collection, Analysis, Findings, Recommendations
|
|
Generated: 2026-06-01
|
|
|
|
# vector/user-manual/source.pdf
|
|
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
|
|
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
|
|
Product User Manual with Quick Start Guide, Unboxing, Setup, Features, Troubleshooting, Support
|
|
Generated: 2026-06-01
|
|
|
|
# vector/financial-report/source.pdf
|
|
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
|
|
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
|
|
Q1 Financial Report with Revenue, Expenses, Net Income, Outlook, Risk Factors
|
|
Generated: 2026-06-01
|
|
|
|
# vector/conference-proceedings/source.pdf
|
|
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
|
|
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
|
|
Conference Proceedings with Keynote Address, Paper Session, Panel Discussion, Workshop
|
|
Generated: 2026-06-01
|
|
|
|
# vector/medical-research/source.pdf
|
|
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
|
|
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
|
|
Clinical Trial Results with Background, Methodology, Results, Discussion, Conclusion
|
|
Generated: 2026-06-01
|
|
|
|
# vector/multi-page-academic/source.pdf
|
|
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
|
|
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
|
|
Multi-page academic paper (3 pages) - Abstract, Introduction, Conclusion
|
|
Generated: 2026-06-01
|
|
|
|
# vector/code-documentation/source.pdf
|
|
Generated by tests/fixtures/vector/generate_vector_cer_corpus.py
|
|
Clean vector PDF with embedded text for CER testing (PDF 1.4, Type1 Helvetica, WinAnsiEncoding)
|
|
Code library documentation with Installation, Quick Example, API Reference, Supported Formats, Limitations, License
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/receipt/receipt-300dpi.pdf
|
|
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
|
|
Source PDF for scan simulation at 300 DPI
|
|
Supermarket receipt with items, prices, totals (Helvetica 10pt, Letter, 14pt line spacing)
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/receipt/receipt-300dpi-scanned.pdf
|
|
Generated by pdftoppm + img2pdf from receipt-300dpi.pdf at 300 DPI
|
|
Scan simulation for OCR testing (rasterized image-only PDF)
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/documents/invoice-300dpi.pdf
|
|
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
|
|
Source PDF for scan simulation at 300 DPI
|
|
Service invoice with line items, totals, payment terms (Helvetica 11pt, Letter, 16pt line spacing)
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/documents/invoice-300dpi-scanned.pdf
|
|
Generated by pdftoppm + img2pdf from invoice-300dpi.pdf at 300 DPI
|
|
Scan simulation for OCR testing (rasterized image-only PDF)
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/documents/form-300dpi.pdf
|
|
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
|
|
Source PDF for scan simulation at 300 DPI
|
|
Employment application form with fields and checkboxes (Helvetica 11pt, Letter, 18pt line spacing)
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/documents/form-300dpi-scanned.pdf
|
|
Generated by pdftoppm + img2pdf from form-300dpi.pdf at 300 DPI
|
|
Scan simulation for OCR testing (rasterized image-only PDF)
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/multi-page/doc-10page-300dpi.pdf
|
|
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
|
|
Source PDF for scan simulation at 300 DPI (10 pages with diverse content)
|
|
Times-Roman 12pt, Letter, 18pt line spacing, "Page N:" markers
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/multi-page/doc-10page-300dpi-scanned.pdf
|
|
Generated by pdftoppm + img2pdf from doc-10page-300dpi.pdf at 300 DPI
|
|
Scan simulation for OCR testing (rasterized image-only PDF, 10 pages)
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/receipt/receipt-300dpi.pdf
|
|
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
|
|
Source PDF for scan simulation at 300 DPI
|
|
Simple sales receipt with itemized list and totals (Helvetica 11pt, 6.5" x 4", 14pt line spacing)
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/receipt/receipt-300dpi-scanned.pdf
|
|
Generated by pdftoppm + img2pdf from receipt-300dpi.pdf at 300 DPI
|
|
Scan simulation for OCR testing (rasterized image-only PDF)
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/documents/invoice-300dpi.pdf
|
|
Generated by tests/fixtures/scanned/generate_scanned_fixtures.py
|
|
Source PDF for scan simulation at 300 DPI
|
|
Business invoice with line items, subtotal, tax, and total (Helvetica 11pt, Letter, 16pt line spacing)
|
|
Generated: 2026-06-01
|
|
|
|
# scanned/documents/invoice-300dpi-scanned.pdf
|
|
Generated by pdftoppm + img2pdf from invoice-300dpi.pdf at 300 DPI
|
|
Scan simulation for OCR testing (rasterized image-only PDF)
|
|
|
|
# json_schema/simple-text.pdf
|
|
Minimal text-only PDF for JSON schema validation tests
|
|
Generated: 2026-06-01
|