jedarden
9215892f95
feat(pdftract-2zw): page classification fixtures + integration tests + reproducibility gate
Implement page classification test fixtures, integration tests, and
reproducibility CI gate for Phase 5.1.5.
Fixtures (4 total, 3.6 KB):
- vector_pure: Pure text PDF (born-digital)
- scanned_single: Image-only PDF (scanned)
- brokenvector_pdfa: Invisible text + image
- hybrid_header_body: Text header + scanned body
Integration tests (crates/pdftract-core/tests/page_classification.rs):
- test_page_classification_fixtures: Validates classification correctness
- test_page_classification_reproducibility: CI gate for byte-identical JSON
- test_fixture_files_exist_and_size: Infrastructure validation
- test_expected_json_validity: JSON schema validation
Acceptance criteria:
- ✅ 4 fixtures present in tests/fixtures/page_class/
- ✅ cargo test page_classification passes (4/4 tests)
- ✅ Reproducibility gate fails on perturbation
- ✅ Fixtures total < 1 MB (3.6 KB)
Refs: pdftract-2zw, plan.md lines 1840-1844
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>