jedarden
1e10692fd3
feat(pdftract-2zw): page classification fixtures + integration tests + reproducibility gate
This commit completes bead pdftract-2zw by adding:
- 4 page classification fixtures in tests/fixtures/page_class/
- vector_pure: Pure text PDF (born-digital)
- scanned_single: Image-only PDF (scanned)
- brokenvector_pdfa: PDF/A with invisible text over image
- hybrid_header_body: Text header + scanned body (hybrid)
- Expected classification JSON files for each fixture
- Integration tests in crates/pdftract-core/tests/page_classification.rs
- test_page_classification_fixtures: validates classification correctness
- test_page_classification_reproducibility: byte-identical JSON on re-classification
- test_fixture_files_exist_and_size: validates fixture size < 1 MB
- test_expected_json_validity: validates JSON schema
- Fixture generator: tests/fixtures/generate_page_class_fixtures.rs
- Updated PROVENANCE.md with new SHA256 hashes
Acceptance criteria PASS:
- 4 fixtures present ✅
- cargo test page_classification passes ✅ (4/4 tests)
- Fixtures total 2927 bytes (< 1 MB) ✅
- Reproducibility gate implemented ✅
Co-Authored-By: Claude Code <noreply@anthropic.com>