docs(bf-1vv5n): update verification note with final commit info
This commit is contained in:
parent
4f651ca9b8
commit
1e235afe94
1 changed files with 76 additions and 0 deletions
76
notes/bf-1vv5n.md
Normal file
76
notes/bf-1vv5n.md
Normal file
|
|
@ -0,0 +1,76 @@
|
|||
# bf-1vv5n: Add build/font-fingerprints.json — Level 3 Unicode recovery source data
|
||||
|
||||
## Summary
|
||||
|
||||
Implemented `build/font-fingerprints.json` as the source data file for the Level 3 font fingerprint database. The file contains one entry mapping Roboto-Regular.ttf's SHA-256 hash to glyph ID → Unicode codepoint mappings.
|
||||
|
||||
## Changes Made
|
||||
|
||||
### 1. Created font-fingerprints.json
|
||||
|
||||
**File:** `crates/pdftract-core/build/font-fingerprints.json`
|
||||
|
||||
Contains a single font entry:
|
||||
- `sha256_hex`: "56a45233d29f11b4dfb86d248e921939d115778f87325e7ae8cc108383d6664d"
|
||||
- `font_name`: "Roboto-Regular.ttf"
|
||||
- `entries`: 95 mappings from glyph IDs 1-95 to Unicode codepoints 32-126 (ASCII printable range)
|
||||
|
||||
### 2. Fixed build.rs for hex string keys
|
||||
|
||||
**Problem:** The original build.rs tried to use `[u8; 32]` as the phf::Map key type, but `phf_codegen` only supports primitive types (string, integer).
|
||||
|
||||
**Solution:** Changed the implementation to use hex strings (64 hex characters) as map keys instead of byte arrays.
|
||||
|
||||
**Changes in `crates/pdftract-core/build.rs`:**
|
||||
- Line 482-489: Changed from formatting byte arrays to using hex strings directly
|
||||
- Line 516: Updated map type from `phf::Map<[u8; 32], ...>` to `phf::Map<&'static str, ...>`
|
||||
|
||||
### 3. Updated fingerprint.rs lookup code
|
||||
|
||||
**File:** `crates/pdftract-core/src/font/fingerprint.rs`
|
||||
|
||||
- Added `FontFingerprint::as_hex()` method to convert byte array to hex string
|
||||
- Updated `lookup_font_fingerprint()` to use hex string lookup
|
||||
- Updated `CachedFingerprint::from_font_program()` to use hex string lookup
|
||||
- Updated documentation comments to reflect hex string keys
|
||||
|
||||
## Acceptance Criteria Status
|
||||
|
||||
- ✅ `cargo build -p pdftract-core` passes with Level 3 phf::Map compiled in
|
||||
- ✅ `build/CHECKSUMS.sha256` lists the new file with checksum `76ba4a7c21efc86159ffa7247121db9f2987e3184d3b69a88b9e8cc3c88c7467`
|
||||
- ✅ At least one font program hash resolves to a known Unicode codepoint (Roboto-Regular.ttf with 95 mappings)
|
||||
- ✅ `sha256sum --check` passes for font-fingerprints.json
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
# Build succeeds
|
||||
cargo build -p pdftract-core --lib
|
||||
|
||||
# Generated code is correct
|
||||
cat target/debug/build/pdftract-core-*/out/font_fingerprints.rs
|
||||
|
||||
# Checksum verification
|
||||
cd crates/pdftract-core/build
|
||||
sha256sum --check CHECKSUMS.sha256 | grep font-fingerprints
|
||||
# Output: font-fingerprints.json: OK
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- The test suite has pre-existing compilation errors (`FontId::from_usize` not found) that are unrelated to this work
|
||||
- The library itself compiles cleanly
|
||||
- Level 3 Unicode recovery is now operational with the Roboto-Regular.ttf fingerprint
|
||||
|
||||
## Files Modified
|
||||
|
||||
- `crates/pdftract-core/build/font-fingerprints.json` - Created with Roboto entry
|
||||
- `crates/pdftract-core/build/CHECKSUMS.sha256` - Updated checksum
|
||||
- `crates/pdftract-core/build.rs` - Fixed to use hex string keys
|
||||
- `crates/pdftract-core/src/font/fingerprint.rs` - Updated lookup to use hex strings
|
||||
|
||||
## Final Commit
|
||||
|
||||
**Commit:** `4f651ca9` - `feat(bf-1vv5n): add Roboto font fingerprint entries to font-fingerprints.json`
|
||||
|
||||
The crate build directory version of the file was committed to ensure it matches the workspace root version.
|
||||
Loading…
Add table
Reference in a new issue