Implement compile-time phf::Set of 20,000 common English words for dictionary coverage scoring in readability analysis (Phase 4.7). Key changes: - Added wordlist-en-20k.txt (20k frequency-sorted English words) - Extended build.rs to generate phf::Set from wordlist - Added layout/wordlist.rs module with is_english_word() API - Added wordlist benchmarks (< 100 ns lookup achieved) Test results: - All 9 unit tests pass - Benchmarks: 13-62 ns per lookup (well under 100 ns requirement) - Binary size: Estimated ~200-220 KB (within 250 KB limit) Closes: pdftract-9wevc Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| predefined-cmaps | ||
| agl.json | ||
| aglfn.txt | ||
| fix_std14_weights.py | ||
| font-fingerprints.json | ||
| generate_agl.py | ||
| generate_std14_metrics.py | ||
| glyphlist.txt | ||
| named-encodings.json | ||
| std14-metrics.json | ||
| wordlist-en-20k.txt | ||