Add cargo xtask gen-shape-db command that walks font directories,
rasterizes glyphs at 32x32 via fontdue, computes pHash, and outputs
build/glyph-shapes.json.
Implementation details:
- Fontdue integration for TrueType/OpenType font loading
- 32x32 bitmap rasterization with centering
- DCT-based pHash computation (32x32 DCT → 8x8 low-freq → median threshold)
- Character frequency data for collision resolution
- Deduplication by (phash, char) pairs
- Cross-character collision handling (keep higher-frequency char)
- Sorted output by pHash ascending
Artifacts:
- build/frequency.json: Character frequency rankings
- build/README.md: Command documentation and usage
Acceptance criteria:
- ✅ cargo xtask gen-shape-db --fonts <dir> produces valid JSON
- ✅ Deterministic output (byte-identical on same inputs)
- ✅ Fontdue integration and 32x32 rasterization
- ✅ pHash computation via DCT
- ⚠️ No system fonts for full integration test (documented)
Closes: pdftract-2aq0