Add 7 adversarial PDF fixtures exercising Phase 1 error-recovery paths: - xref_30pct_bad_offsets.pdf: 100 objects, 30 bad xref offsets - missing_mediabox_all_pages.pdf: 10 pages, no /MediaBox at any level - missing_endobj.pdf: object 5 missing endobj marker - truncated_mid_stream.pdf: FlateDecode stream truncated mid-decompression - int_overflow_bbox.pdf: /BBox value 99999999999999999 (i32 overflow) - nested_failure.pdf: every page has at least one diagnostic - combined_failures.pdf: combines multiple failure modes (keystone INV-8 test) Each fixture has a sibling .expected_diagnostics.json file with threshold counts (>= not == per EC-07/EC-09 to tolerate drift). Integration test harness (error_recovery_integration.rs): - assert_diagnostic_count_at_least() helper for threshold checking - assert_no_panic() helper using std::panic::catch_unwind for INV-8 - Individual test functions for each fixture - Cumulative test_inv_8_no_panics_across_all_fixtures() All 8 tests pass. INV-8 verified: zero panics across all fixtures. Closes: pdftract-4w0v4
22 lines
555 B
JSON
22 lines
555 B
JSON
{
|
|
"description": "Every page has at least one diagnostic",
|
|
"expected_diagnostics": [
|
|
{
|
|
"code": "STRUCT_MISSING_KEY",
|
|
"min_count": 1,
|
|
"description": "Page 1 missing MediaBox"
|
|
},
|
|
{
|
|
"code": "STRUCT_INVALID_NAME",
|
|
"min_count": 1,
|
|
"description": "Page 2 has invalid name in resources"
|
|
},
|
|
{
|
|
"code": "CIRCULAR_REFERENCE",
|
|
"min_count": 1,
|
|
"description": "Page 3 has circular reference"
|
|
}
|
|
],
|
|
"expected_pages": "3",
|
|
"expected_behavior": "all pages extracted, ~3 diagnostics"
|
|
}
|