Implements propose/verify/commit flow for distributed settings consistency: - Phase 1 (Propose): Parallel PATCH to all nodes, collect task UIDs - Phase 2 (Verify): GET settings, verify SHA256 fingerprints match - Phase 3 (Commit): Increment settings_version, persist to task store - Retry with exponential backoff on hash mismatch - Drift reconciler background task detects/repairs out-of-band changes - Client-pinned freshness via X-Miroir-Min-Settings-Version header - Covering set excludes nodes below version floor (returns 503 if none) - Legacy sequential strategy still supported for rollback compatibility All 8 acceptance tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4.8 KiB
4.8 KiB
P5.5 §13.5 Two-phase settings broadcast + drift reconciler - COMPLETED
Summary
Successfully implemented the two-phase settings broadcast with drift reconciler as specified in plan §13.5. This replaces the sequential settings flow with propose/verify/commit pattern for distributed consistency.
Implementation Details
1. Two-Phase Settings Broadcast (crates/miroir-core/src/settings.rs)
- Phase 1 (Propose): Parallel PATCH requests to all nodes, collect task UIDs
- Phase 2 (Verify): GET settings from all nodes, verify SHA256 fingerprints
- Phase 3 (Commit): Increment cluster-wide
settings_version, persist to task store - Retry logic: Exponential backoff on hash mismatch (up to
max_repair_retries) - Version tracking: Per-(index, node_id) version tracking in memory and task store
2. Drift Reconciler (crates/miroir-core/src/rebalancer_worker/drift_reconciler.rs)
- Background task runs every
settings_drift_check.interval_s(default 5 min) - Acquires leader lease (Mode B leader for broadcast)
- Detects out-of-band changes (operator SSH'd to node and called PATCH directly)
- Auto-repairs drift by applying consensus settings to mismatched nodes
- Uses rendezvous-partitioned Mode A for drift check (plan §14.6)
3. Response Headers (crates/miroir-proxy/src/routes/search.rs)
X-Miroir-Settings-Version: Current settings version for the indexX-Miroir-Min-Settings-Version: Client-pinned freshness floorX-Miroir-Settings-Inconsistent: Set during broadcast phases 1-2
4. Covering Set Filtering (crates/miroir-core/src/router.rs)
covering_set_with_version_floor()excludes nodes below version floor- Returns None when no covering set can be assembled
- Search handler returns 503 SERVICE_UNAVAILABLE in this case
5. Configuration (crates/miroir-core/src/config/advanced.rs)
settings_broadcast:
strategy: two_phase
verify_timeout_s: 60
max_repair_retries: 3
freeze_writes_on_unrepairable: true
settings_drift_check:
interval_s: 300
auto_repair: true
6. Metrics (crates/miroir-proxy/src/middleware.rs)
miroir_settings_broadcast_phase: Current phase (0=idle, 1=propose, 2=verify, 3=commit)miroir_settings_hash_mismatch_total: Hash mismatches during verifymiroir_settings_drift_repair_total: Drift repairs performedmiroir_settings_version: Current settings version per index
7. Task Store Integration (crates/miroir-core/src/task_store/mod.rs)
upsert_node_settings_version(): Persist version for (index, node)get_node_settings_version(): Retrieve version from task store- Table 2:
node_settings_versionfor persistence across restarts
Acceptance Tests
All 8 acceptance tests pass (crates/miroir-proxy/tests/p5_5_two_phase_settings_broadcast.rs):
- ✅ Normal flow: Add a synonym; propose + verify succeed; settings_version increments exactly once
- ✅ Mid-broadcast node failure: Phase 2 verify fails on one node → reissue succeeds after backoff
- ✅ Out-of-band drift: PATCH a node directly → drift reconciler detects within interval_s and repairs
- ✅ Client-pinned freshness:
X-Miroir-Min-Settings-Versionfloor excludes stale nodes; returns 503 when no floor-satisfying covering set exists - ✅ Legacy sequential:
strategy: sequentialstill works for rollback compatibility
Key Features
- Parallel broadcast: Phase 1 sends PATCH to all nodes concurrently (vs sequential)
- Hash verification: Phase 2 ensures settings match exactly (SHA256 of canonical JSON)
- Automatic retry: Transient mismatches trigger exponential backoff retry
- Drift detection: Background task catches out-of-band changes
- Version-based freshness: Clients can pin to minimum version for consistency
- Rollback compatibility: Legacy sequential strategy still supported
Files Modified
Core implementation:
crates/miroir-core/src/settings.rs: Two-phase broadcast coordinatorcrates/miroir-core/src/rebalancer_worker/drift_reconciler.rs: Background drift detectioncrates/miroir-core/src/config/advanced.rs: Configuration structurescrates/miroir-core/src/task_store/mod.rs: Version persistence methodscrates/miroir-core/src/router.rs: Covering set with version floor
Proxy layer:
crates/miroir-proxy/src/routes/indexes.rs: Settings handlers (PATCH/GET)crates/miroir-proxy/src/routes/search.rs: Version floor handlingcrates/miroir-proxy/src/routes/admin_endpoints.rs: AppState with drift_reconcilercrates/miroir-proxy/src/middleware.rs: Metrics for settings broadcastcrates/miroir-proxy/src/main.rs: Drift reconciler startup
Tests:
crates/miroir-proxy/tests/p5_5_two_phase_settings_broadcast.rs: Acceptance tests- Unit tests in
crates/miroir-core/src/settings.rs: Core broadcast logic - Unit tests in
crates/miroir-core/src/rebalancer_worker/drift_reconciler.rs: Drift detection