jedarden
|
dd2d3502c6
|
feat(glyph-shape): implement font corpus fetch script and shape DB generation
Implemented scripts/fetch-shape-corpus.sh for downloading open-licensed
font corpus and generating glyph shape database for L4 recognition.
- Script downloads fonts from build/shape-corpus-manifest.txt
- Copies LICENSE files to build/font-licenses/ for compliance
- Idempotent: skips already-present fonts
- Fixed xtask center_bitmap_32x32 overflow bug (width/height > 32)
Generated build/glyph-shapes.json with 9,141 glyphs (> 4500 target):
- DejaVu Sans: 4,459 glyphs (Latin Extended, Greek, Cyrillic)
- Roboto: 2,392 glyphs (Latin Basic, extended)
- JetBrains Mono: 1,176 glyphs (monospace)
- Source Code Pro: 1,124 glyphs (monospace)
build/font-licenses/COMPLIANCE.md documents OFL derivative-work analysis
for pHash data redistribution.
Closes: pdftract-1i8n
|
2026-05-24 09:48:29 -04:00 |
|