Commit graph

2 commits

Author SHA1 Message Date
jedarden
39ca6a3552 feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator
Add image_coverage_fraction signal evaluator that computes the union
image coverage fraction from individual image XObject areas.

- Computes total image coverage as sum of image_xobject_areas
- Divides by page area (width * height) to get coverage fraction
- Clamps to [0.0, 1.0] to handle overlapping images (defensive)
- Returns Some(Vote::scanned(0.85)) if fraction > 0.85

Implementation uses sum for simplicity (overestimates coverage when
images overlap), which is acceptable for the 0.85 threshold as it's
a conservative signal. Can be revisited with Klee's algorithm for
greater accuracy if needed.

Acceptance criteria PASS:
✓ Page with one image covering 90% area → Some(Vote { 0.85, Scanned })
✓ Page with multiple small images totaling 50% → None (below threshold)
✓ Page with no images → None
✓ Coverage clamped to 1.0 on overlapping images

Also includes pre-existing infrastructure:
- tr3_op_count field in PageContext
- image_xobject_areas field in PageContext
- all_tr3_with_full_page_image function
- CharDensityRatioSignal evaluator

These were necessary dependencies for the new evaluator to function.

Refs: Plan section Phase 5.1.2, coordinator pdftract-22p
2026-05-31 23:42:38 -04:00
jedarden
778d9e4c13 feat(pdftract-69iwi): implement remote source mock server test corpus
Add wiremock-based integration test infrastructure for HttpRangeSource with
bandwidth tracking and all 5 critical test scenarios from plan Section 1.8.

## Files added
- tests/remote/fixtures/generate_linearized.rs: Linearized PDF fixture generator
- tests/remote/fixtures/linearized-10.pdf: 10-page linearized PDF with hint stream
- tests/remote/integration.rs: Complete test suite with 12+ test scenarios
- notes/pdftract-69iwi.md: Verification note with PASS/WARN/FAIL status

## Test infrastructure
- BandwidthTracker utility for bandwidth and request counting
- Mock server factories: create_range_server(), create_no_range_server(),
  create_416_server()
- Verification helpers: assert_bytes_transferred(), assert_range_request_count()

## Critical tests implemented (Plan 1.8)
1. test_range_support_page_5_of_100: Bandwidth verification (<100KB)
2. test_no_range_fallback: Full download fallback with REMOTE_NO_RANGE_SUPPORT
3. test_416_retry_without_range: 416 response handling infrastructure
4. test_linearized_hint_stream_prefetch: Linearized PDF with hint stream
5. test_connection_drop_interrupted: REMOTE_FETCH_INTERRUPTED handling
6. test_tls_handshake_failure: Self-signed cert rejection (rcgen)

## INV-8 compliance
All tests verify no panic occurs on network errors, connection drops, or TLS
failures. Errors return Result<> types with appropriate ErrorKind.

## Dependencies
- wiremock 0.6 (mock HTTP server)
- rcgen 0.13 (self-signed TLS certificate generation)
- tokio 1.x (async runtime)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 08:25:23 -04:00