P6.2: Verify peer discovery implementation (plan §14.5)
Verified that peer discovery via headless Service + Downward API is fully implemented: - Helm templates: miroir-headless.yaml Service + POD_NAME/POD_IP env vars - Rust module: peer_discovery.rs with SRV lookup via trust-dns-resolver - Config: peer_discovery section with service_name + refresh_interval_s - Main loop: Background refresh task that updates miroir_peer_pod_count metric - Metrics: miroir_peer_pod_count, miroir_leader, miroir_owned_shards_count gauges - Verification script: tests/verify_p6_2_peer_discovery.sh (NixOS-compatible shebang) All unit tests pass. The implementation requires a Kubernetes deployment for full acceptance testing (3-pod discovery, scale events, pod eviction). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
cf9ae11c3a
commit
bddfeb366c
1 changed files with 69 additions and 0 deletions
69
notes/miroir-m9q.2-verification-summary.md
Normal file
69
notes/miroir-m9q.2-verification-summary.md
Normal file
|
|
@ -0,0 +1,69 @@
|
|||
# P6.2 Peer Discovery Implementation Verification
|
||||
|
||||
## Summary
|
||||
|
||||
Verified that peer discovery per plan §14.5 is fully implemented:
|
||||
|
||||
### 1. Helm Templates ✓
|
||||
- `charts/miroir/templates/miroir-headless.yaml` - Headless Service with label selector
|
||||
- `charts/miroir/templates/miroir-deployment.yaml` - POD_NAME, POD_NAMESPACE, POD_IP env vars via Downward API
|
||||
|
||||
### 2. Rust Implementation ✓
|
||||
- `crates/miroir-core/src/peer_discovery.rs` - SRV-based peer discovery module
|
||||
- `PeerSet` struct with `peers: Vec<PeerId>` and `refreshed_at: Instant`
|
||||
- `PeerDiscovery::refresh()` method for SRV lookup
|
||||
- Feature flag: `peer-discovery` (uses `trust-dns-resolver`)
|
||||
|
||||
### 3. Configuration ✓
|
||||
- `crates/miroir-core/src/config.rs` - `PeerDiscoveryConfig` struct
|
||||
- `service_name: "miroir-headless"` (default)
|
||||
- `refresh_interval_s: 15` (default)
|
||||
- `charts/miroir/values.yaml` - Config section with same defaults
|
||||
|
||||
### 4. Main Loop Integration ✓
|
||||
- `crates/miroir-proxy/src/main.rs` (lines 407-438)
|
||||
- Creates `PeerDiscovery` instance when POD_NAME is set
|
||||
- Spawns background refresh loop with configurable interval
|
||||
- Calls `metrics.set_peer_pod_count(count)` on successful refresh
|
||||
|
||||
### 5. Metrics ✓
|
||||
- `crates/miroir-proxy/src/middleware.rs` (line 823-825, 1582-1584)
|
||||
- `miroir_peer_pod_count` gauge metric
|
||||
- `miroir_leader` gauge metric
|
||||
- `miroir_owned_shards_count` gauge metric
|
||||
|
||||
### 6. Verification Script ✓
|
||||
- `tests/verify_p6_2_peer_discovery.sh` - Checks metrics and env vars
|
||||
- Shebang: `#!/usr/bin/env bash` (NixOS compatible)
|
||||
|
||||
## Acceptance Tests (require K8s environment)
|
||||
|
||||
The following acceptance tests require a real Kubernetes deployment:
|
||||
|
||||
1. **3-pod deployment**: Each pod sees all 3 peer names within 30s of last pod ready
|
||||
2. **Scale 3→5**: New peers discovered within `refresh_interval_s × 2`
|
||||
3. **Pod eviction**: Crashed pod drops from peer set within `refresh_interval_s × 2`
|
||||
4. **Metric verification**: `miroir_peer_pod_count` matches `kube_deployment_status_replicas_ready`
|
||||
|
||||
## Unit Tests
|
||||
|
||||
All peer discovery unit tests pass:
|
||||
- `test_peer_set_empty` ✓
|
||||
- `test_peer_set_with_peers` ✓
|
||||
- `test_srv_target_pod_name_extraction` ✓
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
- The peer discovery implementation was already complete in the codebase
|
||||
- No code changes were required - this task was verification-only
|
||||
- The `peer-discovery` feature flag must be enabled for SRV lookups to work
|
||||
- Peer discovery automatically disables when `POD_NAME=unknown` (local dev)
|
||||
|
||||
## Plan §14.5 Alignment
|
||||
|
||||
Fully implements plan §14.5 "Peer discovery" with:
|
||||
- Headless Service SRV lookup mechanism
|
||||
- 15-second refresh interval (configurable)
|
||||
- Zero-config operation (uses Downward API env vars)
|
||||
- No K8s API calls from pods
|
||||
- Transient double-work is acceptable (idempotent operations)
|
||||
Loading…
Add table
Reference in a new issue