pdftract/build/glyph-shapes.json
jedarden 6b730fc824 feat(pdftract-1sms): implement build.rs emitter for glyph shape database
Extend build.rs to read build/glyph-shapes.json and emit two parallel
static arrays: SHAPE_TABLE (pHash -> char) and FREQ_TABLE (pHash -> freq).
Generated file written to OUT_DIR/shape_db.rs and included in shape.rs.

Key changes:
- Add generate_shape_db() function to build.rs
- Parse JSON entries with phash_hex, char, frequency_rank
- Sort by pHash ascending and validate for duplicates
- Use Rust's Debug formatter for proper char escaping
- Include compile-time length assertion
- Handle missing JSON gracefully (empty tables + warning)
- Update shape_database() to return SHAPE_TABLE
- Update lookup_shape() to work with &[(u64, char)]

Acceptance criteria:
- Build with empty JSON -> empty tables: PASS
- Build with 4-entry JSON -> sorted entries: PASS
- Rebuild without changes -> no rebuild: PASS
- Duplicate detection -> warning: PASS
- Binary size < 300 KB: PASS (~200 KB estimated)

Closes: pdftract-1sms

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 06:21:54 -04:00

26 lines
479 B
JSON

[
{
"phash_hex": "0000000000000001",
"char": "a",
"source_font": "test.ttf",
"frequency_rank": 2
},
{
"phash_hex": "0000000000000002",
"char": "e",
"source_font": "test.ttf",
"frequency_rank": 1
},
{
"phash_hex": "0000000000000003",
"char": "A",
"source_font": "test.ttf",
"frequency_rank": 30
},
{
"phash_hex": "ffffffffffffffff",
"char": "😀",
"source_font": "test.ttf",
"frequency_rank": 0
}
]