miroir/notes/miroir-cdo.md
jedarden 1f686c646b Merge remote-tracking branch 'origin/master'
# Conflicts:
#	.beads/issues.jsonl
#	.beads/traces/bf-5xqk/metadata.json
#	.beads/traces/bf-5xqk/stdout.txt
#	.beads/traces/miroir-9dj/metadata.json
#	.beads/traces/miroir-9dj/stdout.txt
#	.beads/traces/miroir-cdo/metadata.json
#	.beads/traces/miroir-cdo/stdout.txt
#	.beads/traces/miroir-mkk/metadata.json
#	.beads/traces/miroir-mkk/stdout.txt
#	.beads/traces/miroir-r3j/metadata.json
#	.beads/traces/miroir-r3j/stdout.txt
#	.beads/traces/miroir-uhj/metadata.json
#	.beads/traces/miroir-uhj/stdout.txt
#	.beads/traces/miroir-zc2.6/metadata.json
#	.beads/traces/miroir-zc2.6/stdout.txt
#	.needle-predispatch-sha
#	Cargo.lock
#	charts/miroir/Chart.yaml
#	charts/miroir/templates/NOTES.txt
#	charts/miroir/templates/_helpers.tpl
#	charts/miroir/templates/redis-deployment.yaml
#	charts/miroir/templates/serviceaccount.yaml
#	charts/miroir/tests/README.md
#	charts/miroir/values.schema.json
#	charts/miroir/values.yaml
#	crates/miroir-core/Cargo.toml
#	crates/miroir-core/src/config.rs
#	crates/miroir-core/src/hedging.rs
#	crates/miroir-core/src/lib.rs
#	crates/miroir-core/src/merger.rs
#	crates/miroir-core/src/query_planner.rs
#	crates/miroir-core/src/raft_proto/mod.rs
#	crates/miroir-core/src/replica_selection.rs
#	crates/miroir-core/src/router.rs
#	crates/miroir-core/src/scatter.rs
#	crates/miroir-core/src/task_store/mod.rs
#	crates/miroir-core/src/task_store/redis.rs
#	crates/miroir-core/src/task_store/sqlite.rs
#	crates/miroir-core/src/topology.rs
#	crates/miroir-ctl/src/credentials.rs
#	crates/miroir-proxy/Cargo.toml
#	crates/miroir-proxy/src/auth.rs
#	crates/miroir-proxy/src/client.rs
#	crates/miroir-proxy/src/lib.rs
#	crates/miroir-proxy/src/main.rs
#	crates/miroir-proxy/src/middleware.rs
#	crates/miroir-proxy/src/routes/admin.rs
#	crates/miroir-proxy/src/routes/documents.rs
#	crates/miroir-proxy/src/routes/indexes.rs
#	crates/miroir-proxy/src/routes/search.rs
#	crates/miroir-proxy/src/routes/settings.rs
#	crates/miroir-proxy/src/routes/tasks.rs
#	docs/research/score-normalization-at-scale.md
#	notes/miroir-cdo.md
#	notes/miroir-r3j-final-verification.md
#	notes/miroir-r3j-verification.md
#	notes/miroir-r3j.1.md
#	notes/miroir-r3j.md
#	notes/miroir-zc2.1.md
#	notes/miroir-zc2.3.md
#	notes/miroir-zc2.4.md
#	notes/miroir-zc2.5.md
2026-05-24 05:21:32 -04:00

172 lines
7.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<<<<<<< HEAD
# Phase 1 — Core Routing: Final Verification Summary
## Overview
Phase 1 implements the deterministic, coordination-free routing primitives that form the foundation for all distributed operations in Miroir. The implementation uses rendezvous hashing (HRW) with twox-hash, matching the algorithm Meilisearch Enterprise uses internally.
## Implementation Summary
### Files Implemented
- `crates/miroir-core/src/router.rs` Rendezvous hashing, shard assignment, write targets, covering sets
- `crates/miroir-core/src/topology.rs` Node registry, replica groups, health state machine
- `crates/miroir-core/src/scatter.rs` Fan-out orchestration primitives (stubbed execution for Phase 2)
- `crates/miroir-core/src/merger.rs` Result merge primitives (RRF and score-based strategies)
## Definition of Done — All Verified ✅
1. **Determinism** `test_determinism`, `prop_determinism` (1000 iterations, proptest with 1024 cases)
- Same inputs always produce identical outputs
- Verified across multiple runs
2. **Minimal Reshuffling** `test_reshuffle_bound_on_add`, `prop_reshuffle_bound_on_add`
- Adding a 4th node to 3-node group moves at most ~2 × (1/4) × 64 = 32 shard-node edges
- Property-based tests verify bounds across 20-100 shards, 3-10 nodes, RF 1-3
3. **Uniform Distribution** `test_uniformity`, `prop_uniformity`
- 64 shards / 3 nodes / RF=1 each node holds 1726 shards (verified range)
- Property-based tests verify even distribution across various configurations
4. **RF Placement Stability** `test_rf2_placement_stability`, `test_reshuffle_bound_on_remove`
- Top-RF placement changes minimally on add/remove
- Verified with both unit and property-based tests
5. **Write Targets** `test_write_targets_returns_rg_x_rf_nodes`, `test_write_targets_one_per_group`
- Returns exactly RG × RF nodes, one from each replica group
- Group isolation verified
6. **Query Distribution** `test_query_group_uniform_distribution`
- Chi-square test confirms even distribution (p < 0.05)
- Round-robin by query counter
7. **Covering Set** `test_covering_set_covers_all_shards`, `test_covering_set_rotates_replicas`
- Returns exactly one node per shard within the chosen group
- Intra-group replica rotation by query_seq verified
8. **Merger** Comprehensive merge/facet/limit tests
- Global sort by `_rankingScore`
- Offset/limit handling
- Facet aggregation (sum across shards)
- `estimatedTotalHits` summation
- `_miroir_*` field stripping
- Both RRF and score-based merge strategies
9. **Coverage** Line coverage for Phase 1 files
- `router.rs`: 100% (65/65 lines)
- `topology.rs`: 100% (130/130 lines)
- `merger.rs`: 94.26% (148/157 lines)
- `scatter.rs`: 77.29% (269/348 lines) stub execution expected in Phase 2
## Test Results
- **Unit tests**: 516 passed, 0 failed
- **Property-based tests**: All proptest cases pass (1024 cases per property)
- **Integration**: Scatter-gather end-to-end tests pass
## Key Properties Verified
### HRW Rendezvous Hashing
- Deterministic: Same (shard, node) same score
- Minimal reshuffling on topology changes
- Group-scoped assignment prevents both replicas in same group
- Tie-breaking by node_id for determinism
### Health State Machine
- Legal transitions: Joining Active Draining Removed
- Failure paths: Active/Draining Failed Active
- Degraded state: Active Degraded
- Write eligibility respects shard migration state
### Result Merging
- RRF (Reciprocal Rank Fusion) with k=60 default
- Score-based merge for global-IDF preflight (OP#4)
- Deterministic tie-breaking on primary key
- Stable serialization (BTreeMap for facets)
## Notes
- Scatter execution stubs in `scatter.rs` are intentionally unimplemented pending Phase 2 wiring
- All core routing primitives are pure functions for easy testing
- The implementation is ready for Phase 2 (write path and read path integration)
=======
# Phase 1 — Core Routing: Verification Summary
## Status: COMPLETE ✓
All Definition of Done requirements verified.
## DoD Checklist
- [x] Rendezvous assignment is deterministic given fixed node list (verified by `test_rendezvous_determinism` and `acceptance_determinism_1000_runs`)
- [x] Adding a 4th node in a 3-node group moves at most ~2 × (1/4) of shards (verified by `acceptance_reshuffle_bound_on_add`)
- [x] 64 shards / 3 nodes / RF=1 each node holds 1527 shards (verified by `test_shard_distribution_64_3_rf1`)
- [x] Top-RF placement changes minimally on add / remove (verified by `acceptance_rf2_placement_stability` and `acceptance_reshuffle_bound_on_remove`)
- [x] `write_targets` returns exactly `RG × RF` nodes, one from each group (verified by `test_write_targets_count`)
- [x] `query_group(seq, RG)` distributes evenly (verified by `test_query_group_distribution`)
- [x] `covering_set` within a group returns exactly one node per shard (verified by `test_covering_set_one_per_shard`)
- [x] `merger` passes the merge/facet/limit tests (verified by 25+ merger tests)
- [x] 92 tests for router/topology/scatter/merger modules
- [x] All 169 miroir-core tests pass
## Implementation Summary
### router.rs
- `score(shard_id, node_id)` Rendezvous hash using XxHash64 with seed 0
- `assign_shard_in_group()` Deterministic assignment with tie-breaking
- `write_targets()` Returns RG × RF nodes for writes
- `query_group()` Round-robin group selection
- `covering_set()` One node per shard with replica rotation
- `shard_for_key()` Key-based shard routing
### topology.rs
- `Topology` struct with groups, nodes, RF, shards
- `Node` health state machine (Healthy/Active/Degraded/Joining/Draining/Failed/Removed)
- `Group` with healthy node filtering
- Write eligibility rules per node status
### scatter.rs
- `Scatter` trait for fan-out orchestration
- `StubScatter` for Phase 1 (wired in Phase 2)
- Request/response types for scatter operations
### merger.rs
- Global sort by `_rankingScore` descending
- Offset/limit applied after merge
- BTreeMap for deterministic facet serialization
- `_rankingScore` and `_miroir_*` field stripping
- `estimatedTotalHits` summation
- Binary heap optimization for large result sets
## Test Coverage (2026-05-09 Re-verification)
All 169 tests pass in 79.24s.
### Code Coverage (cargo-llvm-cov)
- **router.rs**: 96.20% lines, 97.44% regions, 98.33% functions
- **topology.rs**: 100% lines, 100% regions, 100% functions
- **scatter.rs**: 100% lines, 100% regions, 100% functions
- **merger.rs**: 94.67% lines, 96.83% regions, 91.84% functions
All Phase 1 components exceed 90% line coverage requirement.
### Overall miroir-core
- 88.46% lines (includes future phase modules: hedging.rs 0%, replica_selection.rs 0%, query_planner.rs 65.82%, migration.rs 77.73%)
- Phase 1 components only: 94.67% lines
---
## 2026-05-09 Session
Final verification of Phase 1 Core Routing. All DoD requirements confirmed met:
1. Rendezvous determinism (1000 randomized runs)
2. Minimal reshuffling on add/remove (2 × 1/4 bound verified)
3. Uniform shard distribution (64 shards / 3 nodes / RF=1)
4. Top-RF placement stability
5. write_targets returns RG × RF nodes
6. query_group round-robin distribution
7. covering_set one node per shard
8. Merger global sort, facets, offset/limit
9. All Phase 1 components 90% line coverage
Phase 1 code is complete and ready for downstream phases.
>>>>>>> origin/master