pdftract/docs
jedarden 39ca6a3552 feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator
Add image_coverage_fraction signal evaluator that computes the union
image coverage fraction from individual image XObject areas.

- Computes total image coverage as sum of image_xobject_areas
- Divides by page area (width * height) to get coverage fraction
- Clamps to [0.0, 1.0] to handle overlapping images (defensive)
- Returns Some(Vote::scanned(0.85)) if fraction > 0.85

Implementation uses sum for simplicity (overestimates coverage when
images overlap), which is acceptable for the 0.85 threshold as it's
a conservative signal. Can be revisited with Klee's algorithm for
greater accuracy if needed.

Acceptance criteria PASS:
✓ Page with one image covering 90% area → Some(Vote { 0.85, Scanned })
✓ Page with multiple small images totaling 50% → None (below threshold)
✓ Page with no images → None
✓ Coverage clamped to 1.0 on overlapping images

Also includes pre-existing infrastructure:
- tr3_op_count field in PageContext
- image_xobject_areas field in PageContext
- all_tr3_with_full_page_image function
- CharDensityRatioSignal evaluator

These were necessary dependencies for the new evaluator to function.

Refs: Plan section Phase 5.1.2, coordinator pdftract-22p
2026-05-31 23:42:38 -04:00
..
adr feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance feat(pdftract-5omc): implement SDK conformance test runner pattern 2026-05-18 01:22:23 -04:00
integrations feat(pdftract-2u6q2): implement diagnostic infrastructure 2026-05-25 13:16:38 -04:00
notes docs(pdftract-19oy): add verification note for codespace parser + tokenizer 2026-05-28 12:26:25 -04:00
operations feat(pdftract-30ahi): configure maturin for 5-target wheel builds 2026-05-28 08:04:32 -04:00
plan feat(pdftract-3zhf): add unified TableDetector::detect entry point 2026-05-24 00:51:59 -04:00
research docs(pdftract-1tjn): finalize OpenType MATH and formula extraction research note v1.0 2026-05-24 10:41:39 -04:00
schema/v1.0 feat(pdftract-2qw5j): add explicit enum constraints to JSON Schema 2026-05-28 02:47:54 -04:00
security docs(pdftract-58kz): add security policy documentation 2026-05-20 19:39:24 -04:00
user-docs feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator 2026-05-31 23:42:38 -04:00
research-index.md Add parallel extraction research and comprehensive research index 2026-05-16 16:30:35 -04:00