pdftract/crates/pdftract-core/src/layout
jedarden 8a5d9e9ff5 test(pdftract-1q4ku): add acceptance criteria tests for score_span_readability
The score_span_readability function was already fully implemented
in readability.rs. This commit adds comprehensive tests for the
acceptance criteria of bead pdftract-1q4ku:

- AC1: All-printable English high coverage -> > 0.9
- AC2: All-U+FFFD -> significantly reduced (< 0.7)
- AC3: All-whitespace -> whitespace_score=0 (binary penalty)
- AC4: Low confidence -> scaled by confidence_floor
- AC5: Non-English -> dict_coverage forced to 1.0
- AC6: Ligature split -> integrity 0 lowers score

Also adds tests verifying:
- Empty span returns 0.0
- Confidence threshold (0.6 -> 1.0)
- Whitespace bounds [0.05, 0.40]
- Printable fraction calculation
- Dict coverage enabled/disabled behavior
- Non-English lang tag handling (en, en-US, zh, None)

All tests pass. The implementation correctly computes:
- 0.35 * printable_fraction
- 0.30 * dict_coverage (disabled for non-English)
- 0.15 * whitespace_score (binary in/out bounds)
- 0.10 * ligature_integrity (binary split detection)
- 0.10 * confidence_floor (min(1.0, conf/0.6))

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 00:21:46 -04:00
..
caption.rs feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
code.rs feat(pdftract-8n270): implement code block detection 2026-05-24 10:04:22 -04:00
columns.rs feat(pdftract-2rkc1): implement column confirmation with >= 3 line threshold 2026-05-27 23:09:01 -04:00
correction.rs feat(pdftract-1vrxg): implement word-break normalization 2026-05-27 22:55:57 -04:00
header_footer.rs fix(pdftract-2j4zl): fix header/footer duplicate counting bug 2026-05-28 00:04:13 -04:00
line.rs feat(pdftract-6bwq4): implement baseline clustering algorithm 2026-05-24 10:39:01 -04:00
mod.rs feat(pdftract-3jekw): implement watermark and formula detection stubs 2026-05-27 23:32:22 -04:00
readability.rs test(pdftract-1q4ku): add acceptance criteria tests for score_span_readability 2026-05-28 00:21:46 -04:00
reading_order.rs feat(pdftract-4md5z): implement XY-cut recursive reading order algorithm 2026-05-26 18:37:31 -04:00
watermark_formula.rs fix(pdftract-3jekw): fix watermark_formula test type annotations 2026-05-27 23:37:15 -04:00
wordlist.rs fix: resolve compilation errors across codebase 2026-05-25 08:38:04 -04:00