Implement orchestration layer connecting HttpRangeSource to Phase 1.3 xref resolver and Phase 1.4 document model for remote PDF access: - Document::open_remote() public API for remote PDF loading - Progressive tail fetch (16 KB → 1 MB) for startxref location - Xref forward-scan disabled for remote sources (via is_remote check) - Page-by-page on-demand fetch via HttpRangeSource caching - Resource lazy load through XrefResolver cache - HEAD probe with 405 fallback, no Content-Length handling Acceptance criteria: ✅ open_remote(url) returns Document with correct page count ✅ HEAD failure modes (405, no Content-Length, 401) handled ✅ xref forward-scan disabled for remote (is_remote check) ✅ Page-by-page on-demand fetch (HttpRangeSource LRU cache) ✅ INV-8 maintained (all errors return Result) Files modified: - crates/pdftract-core/src/document.rs (Document::open_remote, from_source) - crates/pdftract-core/src/remote.rs (progressive tail fetch) - crates/pdftract-core/src/lib.rs (re-exports) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 KiB
2 KiB
Document Model Test Fixtures
This directory contains curated PDF fixtures for testing the document model integration.
Fixture Passwords
IMPORTANT: The passwords for encrypted fixtures are NOT secret. They are test fixtures:
encrypted_rc4_test.pdf: RC4-40, password "test"encrypted_aes128_test.pdf: AES-128, password "test"encrypted_aes256_test.pdf: AES-256 (PDF 2.0), password "test"encrypted_empty_password.pdf: RC4-40, empty password
Fixture List
Encrypted Files (EC-04, EC-05, EC-06)
encrypted_rc4_test.pdf— RC4-encrypted, user password "test" (EC-04)encrypted_aes128_test.pdf— AES-128, password "test" (EC-05)encrypted_aes256_test.pdf— AES-256 (PDF 2.0), password "test" (EC-06)encrypted_empty_password.pdf— RC4-encrypted, empty owner passwordencrypted_unknown_handler.pdf— Custom handler (Adobe Public Key, /Filter /Adobe.PubSec)
Tagged PDFs
tagged_3_level_outline.pdf— 3 levels of bookmarks with mixed UTF-16BE/PDFDocEncoded titles
Optional Content (EC-16)
ocg_default_off.pdf— Single OCG with /D /BaseState /OFF (EC-16)
Multi-Revision
multi_revision_3.pdf— 3 incremental revisions, page count differs across revisions
Page Tree Inheritance (EC-09)
inheritance_grandparent_mediabox.pdf— page 0 has no MediaBox; inherits from grandparent /Pages nodemissing_mediabox.pdf— page with no MediaBox anywhere (EC-09)
Resource Merging
partial_resource_override.pdf— page overrides /Resources /Font partially; merged result expected
JavaScript Detection
js_in_openaction.pdf— /OpenAction /S /JavaScript
XFA Forms
xfa_form.pdf— /AcroForm /XFA present
Conformance Detection
pdfa_1b_conformance.pdf— XMP metadata declaring PDF/A-1B conformance
Page Labels
page_labels_roman_arabic.pdf— pages 0..3 roman, pages 4..end arabic
Fixture Generation
Fixtures are generated using qpdf and hand-crafted PDF construction.
See scripts/generate_document_model_fixtures.sh for generation scripts.