pdftract/build/shape-corpus-manifest.txt
jedarden dd2d3502c6 feat(glyph-shape): implement font corpus fetch script and shape DB generation
Implemented scripts/fetch-shape-corpus.sh for downloading open-licensed
font corpus and generating glyph shape database for L4 recognition.

- Script downloads fonts from build/shape-corpus-manifest.txt
- Copies LICENSE files to build/font-licenses/ for compliance
- Idempotent: skips already-present fonts
- Fixed xtask center_bitmap_32x32 overflow bug (width/height > 32)

Generated build/glyph-shapes.json with 9,141 glyphs (> 4500 target):
  - DejaVu Sans: 4,459 glyphs (Latin Extended, Greek, Cyrillic)
  - Roboto: 2,392 glyphs (Latin Basic, extended)
  - JetBrains Mono: 1,176 glyphs (monospace)
  - Source Code Pro: 1,124 glyphs (monospace)

build/font-licenses/COMPLIANCE.md documents OFL derivative-work analysis
for pHash data redistribution.

Closes: pdftract-1i8n
2026-05-24 09:48:29 -04:00

14 lines
893 B
Text

# Shape Corpus Font Manifest
# Format: family_name|url|license_short_id|target_file
# The script downloads fonts to build/shape-corpus/ and copies licenses to build/font-licenses/
# Latin Basic + Extended
DejaVu Sans|https://sourceforge.net/projects/dejavu/files/dejavu/2.37/dejavu-fonts-ttf-2.37.zip|SIL-OFL-1.0|DejaVuSans.ttf
Roboto|https://github.com/googlefonts/roboto/raw/main/src/hinted/Roboto-Regular.ttf|Apache-2.0|Roboto-Regular.ttf
# Monospace
Source Code Pro|https://github.com/adobe-fonts/source-code-pro/raw/release/OTF/SourceCodePro-Regular.otf|SIL-OFL-1.1|SourceCodePro-Regular.otf
JetBrains Mono|https://github.com/JetBrains/JetBrainsMono/raw/master/fonts/ttf/JetBrainsMono-Regular.ttf|SIL-OFL-1.1|JetBrainsMono-Regular.ttf
# Greek / Cyrillic Support
DejaVu Sans|https://sourceforge.net/projects/dejavu/files/dejavu/2.37/dejavu-fonts-ttf-2.37.zip|SIL-OFL-1.0|DejaVuSans.ttf