pdftract/crates
jedarden ca1582a839 feat(pdftract-47vu): implement pHash for glyph shape recognition
Implement phash_glyph(bitmap: &[u8; 1024]) -> u64 that computes
a 64-bit perceptual hash for 32×32 grayscale glyph bitmaps.

Algorithm:
1. Normalize pixel values to [-1.0, +1.0]
2. Apply 32×32 2D DCT-II (hand-rolled, precomputed basis)
3. Extract 64 low-frequency AC coefficients (8×8 block, DC excluded)
4. Threshold against median to produce 64-bit hash

Key features:
- Special case for uniform bitmaps (returns 0 deterministically)
- Deterministic across platforms (no NaN, stable float ordering)
- hamming_distance helper for hash comparison

Closes: pdftract-47vu
2026-05-24 04:20:55 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-p7yll): implement cm operator diagnostics 2026-05-24 04:13:16 -04:00
pdftract-core feat(pdftract-47vu): implement pHash for glyph shape recognition 2026-05-24 04:20:55 -04:00
pdftract-libpdftract feat(pdftract-juc): implement Standard 14 font metrics registry 2026-05-23 14:04:02 -04:00
pdftract-py docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00