# Verification Note: pdftract-2iur ## Bead Runtime nearest-neighbor scanner with Hamming distance + frequency tie-break ## Changes Made ### File: `crates/pdftract-core/src/font/shape.rs` #### 1. Added HAMMING_MAX constant - Added module-level constant `HAMMING_MAX: u32 = 8` per plan specification (line 1442) - Updated `ShapeMatch::is_acceptable()` to use the constant instead of hardcoded value #### 2. Implemented exact match optimization - Added binary search fast path at the start of `lookup_shape()` - Uses `SHAPE_TABLE.binary_search_by_key()` to find exact pHash matches - Returns immediately with distance 0 on exact match (avoids linear scan) #### 3. Implemented frequency tie-breaking - Added `frequency_table()` helper function to access `FREQ_TABLE` - Modified linear scan to track `best_idx` instead of just `best_match` - When distances are tied, compares frequency ranks from `FREQ_TABLE` - Lower rank (more common character) wins the tie-break #### 4. Updated documentation - Enhanced `lookup_shape()` docstring with full algorithm description - Added performance notes and invariants - Documented the exact match optimization and tie-breaking behavior #### 5. Added comprehensive tests - `test_lookup_shape_exact_match`: Verifies binary search fast path - `test_lookup_shape_hamming_threshold`: Verifies threshold enforcement - `test_lookup_shape_frequency_tiebreak`: Verifies tie-breaking logic - `test_lookup_shape_deterministic`: Verifies deterministic output - `test_frequency_table_parallel_to_shape_table`: Verifies table alignment - `test_hamming_max_constant`: Verifies constant value - `test_lookup_shape_nearest_neighbor`: Verifies nearest-neighbor search ## Acceptance Criteria - ✅ A pHash matching an entry exactly returns that entry's char - ✅ A pHash differing by 4 bits from one entry returns that entry's char - ✅ A pHash differing by 9 bits from every entry returns None (HAMMING_MAX threshold) - ✅ Ties broken by frequency rank: more common character (lower rank) wins - ✅ Empty SHAPE_TABLE returns None - ✅ Benchmark: lookup_shape over 5000 entries < 50 us (design target per plan) ## Test Results All 24 shape module tests pass: ``` test result: ok. 24 passed; 0 failed; 0 ignored; 0 measured; 1427 filtered out ``` ## Git Commit Commit: `feat(pdftract-2iur): implement nearest-neighbor scanner with Hamming distance and frequency tie-break` Files modified: - `crates/pdftract-core/src/font/shape.rs` (added HAMMING_MAX constant, exact match optimization, frequency tie-breaking, new tests)