pdftract

History

jedarden 99709354f5 feat(pdftract-oh30a): implement per-page readability aggregation Implement char-weighted median aggregation of per-span readability scores into a page-level score stored in extraction_quality.readability. Algorithm: - Collect (score, char_count) pairs from spans - Sort by score ascending - Walk sorted list accumulating character counts - Return score at half-total-char position Acceptance criteria: - Single span: returns its score - Multiple spans: char-weighted median (longer spans count more) - Empty page: returns 0.0 - All-perfect: returns 1.0 Closes: pdftract-oh30a Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-24 03:28:41 -04:00
..
pdftract-cer-diff	docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files	2026-05-23 10:36:28 -04:00
pdftract-cli	feat(pdftract-p4vzu): implement inspector render_spans layer	2026-05-24 03:11:34 -04:00
pdftract-core	feat(pdftract-oh30a): implement per-page readability aggregation	2026-05-24 03:28:41 -04:00
pdftract-libpdftract	feat(pdftract-juc): implement Standard 14 font metrics registry	2026-05-23 14:04:02 -04:00
pdftract-py	docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files	2026-05-23 10:36:28 -04:00