pdftract/crates/pdftract-core/build
jedarden b96c3bfd37 feat(pdftract-9wevc): implement 20k English wordlist for readability scoring
Implement compile-time phf::Set of 20,000 common English words for
dictionary coverage scoring in readability analysis (Phase 4.7).

Key changes:
- Added wordlist-en-20k.txt (20k frequency-sorted English words)
- Extended build.rs to generate phf::Set from wordlist
- Added layout/wordlist.rs module with is_english_word() API
- Added wordlist benchmarks (< 100 ns lookup achieved)

Test results:
- All 9 unit tests pass
- Benchmarks: 13-62 ns per lookup (well under 100 ns requirement)
- Binary size: Estimated ~200-220 KB (within 250 KB limit)

Closes: pdftract-9wevc

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 09:29:13 -04:00
..
predefined-cmaps feat(pdftract-43ry): implement predefined CMap registry 2026-05-23 23:00:59 -04:00
agl.json feat(pdftract-28m6): implement AGL compile-time phf::Map 2026-05-23 18:44:47 -04:00
aglfn.txt feat(pdftract-28m6): implement AGL compile-time phf::Map 2026-05-23 18:44:47 -04:00
fix_std14_weights.py feat(pdftract-juc): implement Standard 14 font metrics registry 2026-05-23 14:04:02 -04:00
font-fingerprints.json feat(pdftract-njde): implement font fingerprint cache (Level 3) 2026-05-23 21:27:24 -04:00
generate_agl.py feat(pdftract-28m6): implement AGL compile-time phf::Map 2026-05-23 18:44:47 -04:00
generate_std14_metrics.py feat(pdftract-juc): implement Standard 14 font metrics registry 2026-05-23 14:04:02 -04:00
glyphlist.txt feat(pdftract-28m6): implement AGL compile-time phf::Map 2026-05-23 18:44:47 -04:00
named-encodings.json feat(pdftract-3dwu): implement named encoding tables 2026-05-23 18:00:05 -04:00
std14-metrics.json feat(pdftract-juc): implement Standard 14 font metrics registry 2026-05-23 14:04:02 -04:00
wordlist-en-20k.txt feat(pdftract-9wevc): implement 20k English wordlist for readability scoring 2026-05-24 09:29:13 -04:00