miroir/crates/miroir-proxy
jedarden b0f89e1f6d Phase 4 — Topology Operations: Complete rebalancer and failure handling
Implements plan §2 topology changes and §4 rebalancer with full elastic
cluster operations: node addition/removal, replica group management, and
unplanned failure handling.

Core changes:
- topology.rs: Add GroupState::Draining for group removal flow
- router.rs: query_group_active() excludes draining groups via is_routing()
- scatter.rs: Health filtering with cross-group fallback for failed nodes
- rebalancer.rs: Add handle_node_recovery() for RF restore after recovery
- main.rs: Unplanned node failure detection with consecutive failure/success
  tracking, automatic Degraded/Failed transitions, and recovery event triggers

Admin API:
- POST /_miroir/nodes/{id}/recover - Mark failed node as recovered
- DELETE /_miroir/nodes/{id} - Remove node (after drain)
- POST /_miroir/nodes/{id}/drain - Start node drain for removal
- POST /_miroir/nodes/{id}/fail - Mark node as failed
- POST /_miroir/replica_groups - Add replica group
- GET /_miroir/replica_groups/{id}/status - Group sync progress
- POST /_miroir/replica_groups/{id}/activate - Mark group active
- DELETE /_miroir/replica_groups/{id} - Remove replica group

Tests:
- p4_topology_chaos.rs: All 5 chaos tests pass
  * Add node mid-indexing: docs readable, no duplicates
  * Drain node while querying: zero client-visible failures
  * Add replica group while querying: existing groups unaffected
  * Rebalance moves ≤ 2×(1/4) of docs (optimal)
  * Restart node mid-rebalance: pauses + resumes, no data loss
- p25_task_reconciliation.rs: Task ID reconciliation acceptance tests

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 23:57:53 -04:00
..
src Phase 4 — Topology Operations: Complete rebalancer and failure handling 2026-05-23 23:57:53 -04:00
static Phase 5: Add Advanced Capabilities verification and UI static assets 2026-05-03 19:01:22 -04:00
tests Phase 4 — Topology Operations: Complete rebalancer and failure handling 2026-05-23 23:57:53 -04:00
Cargo.toml P2.5 Task ID reconciliation: Add test helpers and fix error tests 2026-05-23 23:02:42 -04:00