pdftract/crates
jedarden 99709354f5 feat(pdftract-oh30a): implement per-page readability aggregation
Implement char-weighted median aggregation of per-span readability
scores into a page-level score stored in extraction_quality.readability.

Algorithm:
- Collect (score, char_count) pairs from spans
- Sort by score ascending
- Walk sorted list accumulating character counts
- Return score at half-total-char position

Acceptance criteria:
- Single span: returns its score
- Multiple spans: char-weighted median (longer spans count more)
- Empty page: returns 0.0
- All-perfect: returns 1.0

Closes: pdftract-oh30a

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 03:28:41 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-p4vzu): implement inspector render_spans layer 2026-05-24 03:11:34 -04:00
pdftract-core feat(pdftract-oh30a): implement per-page readability aggregation 2026-05-24 03:28:41 -04:00
pdftract-libpdftract feat(pdftract-juc): implement Standard 14 font metrics registry 2026-05-23 14:04:02 -04:00
pdftract-py docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00