P4.2: Verify node addition with dual-write + paginated migration

Verified the P4.2 implementation is complete:
- All 6 integration tests pass (p42_node_addition.rs)
- All 14 cutover chaos tests pass
- All 8 topology chaos tests pass
- Core components: rebalancer.rs, migration.rs, rebalancer_worker/mod.rs
- Admin API: POST /_miroir/nodes endpoint

Acceptance criteria met:
- 3→4 node migration with 10K docs verified
- Chaos testing confirms dual-write catches late writes
- Performance bounds verified (≤total_docs/(Ng+1)×1.1)
- Log inspection confirms old node not queried after migration

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
jedarden 2026-05-23 08:20:36 -04:00
parent a5b48b79c8
commit 6b52d22771

79
notes/miroir-mkk.2.md Normal file
View file

@ -0,0 +1,79 @@
# P4.2 Node Addition: Dual-Write + Paginated Shard Migration - Verification
## Summary
Verified that the P4.2 node addition implementation is complete and fully functional. All acceptance criteria tests pass.
## Implementation Verified
### Core Components (Already Implemented)
1. **`crates/miroir-core/src/rebalancer.rs`**:
- `add_node()` method creates node addition operation
- `compute_shard_moves_for_new_node()` determines affected shards
- `run_migration_task()` executes paginated migration with dual-write
- `HttpMigrationExecutor` implements actual HTTP-based document migration
- Delta pass support for catching stragglers
2. **`crates/miroir-core/src/migration.rs`**:
- Complete migration state machine with all phases
- `MigrationCoordinator` for state transitions
- `begin_dual_write()` starts dual-write phase
- `begin_cutover()`, `complete_drain()`, `complete_cleanup()` for cutover sequence
- Safety validation (anti-entropy required if delta pass skipped)
3. **`crates/miroir-core/src/rebalancer_worker/mod.rs`**:
- Background worker with leader lease
- `on_node_added()` handles topology events
- `drive_migrations()` advances migration state machine
- Job persistence via TaskStore
4. **`crates/miroir-proxy/src/routes/admin_endpoints.rs`**:
- `add_node` API endpoint
- Integration with rebalancer worker via event channel
### Integration Tests (All Pass)
**`crates/miroir-core/tests/p42_node_addition.rs`**:
1. `p42_node_addition_3_to_4_migration_10k_docs` - 10K docs migration test
2. `p42_chaos_writes_during_migration_dual_write` - Dual-write chaos test
3. `p42_performance_document_count_bounds` - Performance bounds verification
4. `p42_log_inspection_old_node_not_queried_after_migration` - Log inspection test
5. `p42_verify_dual_write_during_migration` - Dual-write verification
6. `p42_pagination_limit_offset` - Pagination verification
**`crates/miroir-core/tests/cutover_race.rs`**:
- 14 chaos tests for cutover boundary conditions
- All pass with anti-entropy and delta pass
**`crates/miroir-core/tests/p4_topology_chaos.rs`**:
- 8 topology chaos tests
- All pass
## Acceptance Criteria Status
- [x] Integration test: 3-node → 4-node migration, 10K docs, each doc still retrievable by ID after migration
- [x] Chaos: toggle writes on/off during migration; dual-write window catches all late writes
- [x] Performance: migrating `~S/(Ng+1)` shards moves ≤ `total_docs / (Ng+1) × 1.1` docs (10% slack for dual-write dupes)
- [x] The old node is not queried for the migrated shards after step 8 (verified via log inspection)
## Test Results
```
running 6 tests (p42_node_addition)
test p42_chaos_writes_during_migration_dual_write ... ok
test p42_node_addition_3_to_4_migration_10k_docs ... ok
test p42_log_inspection_old_node_not_queried_after_migration ... ok
test p42_performance_document_count_bounds ... ok
test p42_verify_dual_write_during_migration ... ok
test p42_pagination_limit_offset ... ok
test result: ok. 6 passed
running 14 tests (cutover_race)
test result: ok. 14 passed
running 8 tests (p4_topology_chaos)
test result: ok. 8 passed
```
Total: 28 tests passed, 0 failed.