pdftract/tests/document_model/fixtures/README.md
jedarden f85e5149dd feat(pdftract-91e1i): HTTP fetch sequence implementation
Implement orchestration layer connecting HttpRangeSource to Phase 1.3
xref resolver and Phase 1.4 document model for remote PDF access:

- Document::open_remote() public API for remote PDF loading
- Progressive tail fetch (16 KB → 1 MB) for startxref location
- Xref forward-scan disabled for remote sources (via is_remote check)
- Page-by-page on-demand fetch via HttpRangeSource caching
- Resource lazy load through XrefResolver cache
- HEAD probe with 405 fallback, no Content-Length handling

Acceptance criteria:
 open_remote(url) returns Document with correct page count
 HEAD failure modes (405, no Content-Length, 401) handled
 xref forward-scan disabled for remote (is_remote check)
 Page-by-page on-demand fetch (HttpRangeSource LRU cache)
 INV-8 maintained (all errors return Result)

Files modified:
- crates/pdftract-core/src/document.rs (Document::open_remote, from_source)
- crates/pdftract-core/src/remote.rs (progressive tail fetch)
- crates/pdftract-core/src/lib.rs (re-exports)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 13:17:00 -04:00

2 KiB

Document Model Test Fixtures

This directory contains curated PDF fixtures for testing the document model integration.

Fixture Passwords

IMPORTANT: The passwords for encrypted fixtures are NOT secret. They are test fixtures:

  • encrypted_rc4_test.pdf: RC4-40, password "test"
  • encrypted_aes128_test.pdf: AES-128, password "test"
  • encrypted_aes256_test.pdf: AES-256 (PDF 2.0), password "test"
  • encrypted_empty_password.pdf: RC4-40, empty password

Fixture List

Encrypted Files (EC-04, EC-05, EC-06)

  • encrypted_rc4_test.pdf — RC4-encrypted, user password "test" (EC-04)
  • encrypted_aes128_test.pdf — AES-128, password "test" (EC-05)
  • encrypted_aes256_test.pdf — AES-256 (PDF 2.0), password "test" (EC-06)
  • encrypted_empty_password.pdf — RC4-encrypted, empty owner password
  • encrypted_unknown_handler.pdf — Custom handler (Adobe Public Key, /Filter /Adobe.PubSec)

Tagged PDFs

  • tagged_3_level_outline.pdf — 3 levels of bookmarks with mixed UTF-16BE/PDFDocEncoded titles

Optional Content (EC-16)

  • ocg_default_off.pdf — Single OCG with /D /BaseState /OFF (EC-16)

Multi-Revision

  • multi_revision_3.pdf — 3 incremental revisions, page count differs across revisions

Page Tree Inheritance (EC-09)

  • inheritance_grandparent_mediabox.pdf — page 0 has no MediaBox; inherits from grandparent /Pages node
  • missing_mediabox.pdf — page with no MediaBox anywhere (EC-09)

Resource Merging

  • partial_resource_override.pdf — page overrides /Resources /Font partially; merged result expected

JavaScript Detection

  • js_in_openaction.pdf — /OpenAction /S /JavaScript

XFA Forms

  • xfa_form.pdf — /AcroForm /XFA present

Conformance Detection

  • pdfa_1b_conformance.pdf — XMP metadata declaring PDF/A-1B conformance

Page Labels

  • page_labels_roman_arabic.pdf — pages 0..3 roman, pages 4..end arabic

Fixture Generation

Fixtures are generated using qpdf and hand-crafted PDF construction.

See scripts/generate_document_model_fixtures.sh for generation scripts.