P5.8.b: Verify bucket-granular re-digest implementation
Verified that P5.8.b (anti-entropy diff step) was already fully implemented in anti_entropy.rs. Created notes documenting: - Bucket assignment via pk-hash % 256 - Per-bucket digest computation during fingerprint - Divergent bucket identification - Bucket-specific PK enumeration - Bucket-level replica comparison All 12 tests in p5_8_b_anti_entropy_diff.rs cover the functionality. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
0ca40b6bf0
commit
764878ce41
1 changed files with 60 additions and 0 deletions
60
notes/miroir-uhj.8.2.md
Normal file
60
notes/miroir-uhj.8.2.md
Normal file
|
|
@ -0,0 +1,60 @@
|
|||
# P5.8.b: Bucket-Granular Re-Digest for Anti-Entropy Diff Step
|
||||
|
||||
## Status: Already Implemented
|
||||
|
||||
P5.8.b (plan §13.8 step 2) was already fully implemented in `/home/coding/miroir/crates/miroir-core/src/anti_entropy.rs`.
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### 1. Bucket Assignment (`bucket_for_primary_key()`, lines 171-175)
|
||||
- Uses xxh3 hash of primary key with seed 0
|
||||
- Modulo 256 to assign bucket (0-255)
|
||||
- Each bucket isolates ~0.4% of PK space
|
||||
|
||||
### 2. Per-Bucket Hashing During Fingerprint (lines 224-226, 269-271, 284-287)
|
||||
- Creates 256 separate hashers (one per bucket)
|
||||
- Each document's hash is folded into both global digest AND its bucket digest
|
||||
- Returns `ShardFingerprint` with `bucket_hashes: Vec<String>` (256 elements)
|
||||
|
||||
### 3. Divergent Bucket Detection (`diff_fingerprints()`, lines 307-335)
|
||||
- Compares per-bucket hashes between replicas
|
||||
- Returns list of divergent bucket IDs
|
||||
- Falls back to treating all buckets as divergent if bucket_hashes not computed
|
||||
|
||||
### 4. Bucket-Specific PK Enumeration (`fetch_bucket_pks()`, lines 341-392)
|
||||
- Fetches all documents in shard with pagination
|
||||
- Filters to only documents in target bucket
|
||||
- Returns map of PK → content_hash
|
||||
- Uses 10ms throttling between batches
|
||||
|
||||
### 5. Bucket-Level Replica Comparison (`compare_bucket_replicas()`, lines 400-447)
|
||||
- Fetches bucket PKs from both replicas
|
||||
- Returns `ReplicaDiff` with:
|
||||
- `a_only_pks`: PKs only on replica A
|
||||
- `b_only_pks`: PKs only on replica B
|
||||
- `mismatched_pks`: PKs with different content hashes
|
||||
|
||||
### 6. Integration with Repair Flow (`repair_shard()`, lines 609-696)
|
||||
- Uses `diff_fingerprints()` to find divergent buckets
|
||||
- For each divergent bucket, calls `compare_bucket_replicas()`
|
||||
- Currently only logs divergences (repair writes TODO: P5.8.c)
|
||||
|
||||
## Test Coverage
|
||||
|
||||
Comprehensive tests in `/home/coding/miroir/crates/miroir-proxy/tests/p5_8_b_anti_entropy_diff.rs`:
|
||||
1. `test_bucket_for_primary_key_deterministic` - Verifies deterministic bucket assignment
|
||||
2. `test_bucket_for_primary_key_distributes` - Verifies even distribution
|
||||
3. `test_fingerprint_shard_includes_bucket_hashes` - Verifies per-bucket hash computation
|
||||
4. `test_diff_fingerprints_identical` - Tests no divergence case
|
||||
5. `test_diff_fingerprints_divergent_buckets` - Tests divergent bucket detection
|
||||
6. `test_fetch_bucket_pks_filters_by_bucket` - Tests bucket filtering
|
||||
7. `test_compare_bucket_replicas_no_divergence` - Tests identical buckets
|
||||
8. `test_compare_bucket_replicas_a_only` - Tests PK only on replica A
|
||||
9. `test_compare_bucket_replicas_b_only` - Tests PK only on replica B
|
||||
10. `test_compare_bucket_replicas_mismatched_content` - Tests content hash mismatch
|
||||
11. `test_diff_fingerprints_isolates_divergence` - Verifies ~0.4% isolation per bucket
|
||||
12. `test_bucket_count_constant` - Verifies BUCKET_COUNT = 256
|
||||
|
||||
## Reusability for §13.1 Reshard Verify
|
||||
|
||||
The `bucket_for_primary_key()` function is public and documented for reuse in reshard verification (plan §13.1), where PK-keyed (not shard-keyed) bucketing is needed for cross-shard comparison.
|
||||
Loading…
Add table
Reference in a new issue