Add comprehensive test suite for the bucket-granular re-digest step (plan §13.8 step 2). All 18 tests pass. Tests verify: - Deterministic bucket assignment (pk-hash % 256) - Even distribution across buckets - Per-bucket hash computation during fingerprint - Divergent bucket identification - Bucket-specific PK enumeration - Replica comparison within divergent buckets - Cross-index comparison for reshard verification (plan §13.1) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4.6 KiB
4.6 KiB
P5.8.b: Bucket-Granular Re-Digest for Anti-Entropy Diff Step
Status: Already Implemented
P5.8.b (plan §13.8 step 2) was already fully implemented in /home/coding/miroir/crates/miroir-core/src/anti_entropy.rs.
Implementation Details
1. Bucket Assignment (bucket_for_primary_key(), lines 171-175)
- Uses xxh3 hash of primary key with seed 0
- Modulo 256 to assign bucket (0-255)
- Each bucket isolates ~0.4% of PK space
2. Per-Bucket Hashing During Fingerprint (lines 224-226, 269-271, 284-287)
- Creates 256 separate hashers (one per bucket)
- Each document's hash is folded into both global digest AND its bucket digest
- Returns
ShardFingerprintwithbucket_hashes: Vec<String>(256 elements)
3. Divergent Bucket Detection (diff_fingerprints(), lines 307-335)
- Compares per-bucket hashes between replicas
- Returns list of divergent bucket IDs
- Falls back to treating all buckets as divergent if bucket_hashes not computed
4. Bucket-Specific PK Enumeration (fetch_bucket_pks(), lines 341-392)
- Fetches all documents in shard with pagination
- Filters to only documents in target bucket
- Returns map of PK → content_hash
- Uses 10ms throttling between batches
5. Bucket-Level Replica Comparison (compare_bucket_replicas(), lines 400-447)
- Fetches bucket PKs from both replicas
- Returns
ReplicaDiffwith:a_only_pks: PKs only on replica Ab_only_pks: PKs only on replica Bmismatched_pks: PKs with different content hashes
6. Integration with Repair Flow (repair_shard(), lines 609-696)
- Uses
diff_fingerprints()to find divergent buckets - For each divergent bucket, calls
compare_bucket_replicas() - Currently only logs divergences (repair writes TODO: P5.8.c)
Test Coverage
Comprehensive tests in /home/coding/miroir/crates/miroir-proxy/tests/p5_8_b_anti_entropy_diff.rs:
test_bucket_for_primary_key_deterministic- Verifies deterministic bucket assignmenttest_bucket_for_primary_key_distributes- Verifies even distributiontest_fingerprint_shard_includes_bucket_hashes- Verifies per-bucket hash computationtest_diff_fingerprints_identical- Tests no divergence casetest_diff_fingerprints_divergent_buckets- Tests divergent bucket detectiontest_fetch_bucket_pks_filters_by_bucket- Tests bucket filteringtest_compare_bucket_replicas_no_divergence- Tests identical bucketstest_compare_bucket_replicas_a_only- Tests PK only on replica Atest_compare_bucket_replicas_b_only- Tests PK only on replica Btest_compare_bucket_replicas_mismatched_content- Tests content hash mismatchtest_diff_fingerprints_isolates_divergence- Verifies ~0.4% isolation per buckettest_bucket_count_constant- Verifies BUCKET_COUNT = 256
Reusability for §13.1 Reshard Verify
The bucket_for_primary_key() function is public and documented for reuse in reshard verification (plan §13.1), where PK-keyed (not shard-keyed) bucketing is needed for cross-shard comparison.
Verification (2026-05-23)
All 18 tests in p5_8_b_anti_entropy_diff.rs passed:
test_bucket_count_constant- Verifies BUCKET_COUNT = 256test_bucket_for_primary_key_deterministic- Verifies deterministic bucket assignmenttest_bucket_for_primary_key_distributes- Verifies even distribution across bucketstest_fingerprint_shard_includes_bucket_hashes- Verifies per-bucket hash computationtest_diff_fingerprints_identical- Tests no divergence casetest_diff_fingerprints_divergent_buckets- Tests divergent bucket detectiontest_diff_fingerprints_isolates_divergence- Verifies ~0.4% isolation per buckettest_fetch_bucket_pks_filters_by_bucket- Tests bucket filteringtest_compare_bucket_replicas_no_divergence- Tests identical bucketstest_compare_bucket_replicas_a_only- Tests PK only on replica Atest_compare_bucket_replicas_b_only- Tests PK only on replica Btest_compare_bucket_replicas_mismatched_content- Tests content hash mismatchtest_compare_index_buckets_identical- Cross-index comparison with identical contenttest_compare_index_buckets_a_only- Cross-index comparison with documents only in Atest_compare_index_buckets_b_only- Cross-index comparison with documents only in Btest_compare_index_buckets_mismatched_content- Cross-index comparison with mismatched contenttest_compare_index_buckets_across_different_shard_counts- PK-keyed bucketing works across different shard counts (reshard verification)test_compare_index_buckets_multiple_divergent_buckets- Divergence isolation to specific buckets
The bucket-granular re-digest implementation for P5.8.b is verified complete.