pdftract/xtask
jedarden f08369bbf0 feat(xtask): implement gen-shape-db subcommand for glyph pHash database
Add cargo xtask gen-shape-db command that walks font directories,
rasterizes glyphs at 32x32 via fontdue, computes pHash, and outputs
build/glyph-shapes.json.

Implementation details:
- Fontdue integration for TrueType/OpenType font loading
- 32x32 bitmap rasterization with centering
- DCT-based pHash computation (32x32 DCT → 8x8 low-freq → median threshold)
- Character frequency data for collision resolution
- Deduplication by (phash, char) pairs
- Cross-character collision handling (keep higher-frequency char)
- Sorted output by pHash ascending

Artifacts:
- build/frequency.json: Character frequency rankings
- build/README.md: Command documentation and usage

Acceptance criteria:
-  cargo xtask gen-shape-db --fonts <dir> produces valid JSON
-  Deterministic output (byte-identical on same inputs)
-  Fontdue integration and 32x32 rasterization
-  pHash computation via DCT
- ⚠️ No system fonts for full integration test (documented)

Closes: pdftract-2aq0
2026-05-24 05:40:44 -04:00
..
src feat(xtask): implement gen-shape-db subcommand for glyph pHash database 2026-05-24 05:40:44 -04:00
Cargo.lock feat(xtask): implement gen-shape-db subcommand for glyph pHash database 2026-05-24 05:40:44 -04:00
Cargo.toml feat(xtask): implement gen-shape-db subcommand for glyph pHash database 2026-05-24 05:40:44 -04:00