P6.2: Verify peer discovery implementation (plan §14.5)

Verified that peer discovery via headless Service + Downward API is
fully implemented:

- Helm templates: miroir-headless.yaml Service + POD_NAME/POD_IP env vars
- Rust module: peer_discovery.rs with SRV lookup via trust-dns-resolver
- Config: peer_discovery section with service_name + refresh_interval_s
- Main loop: Background refresh task that updates miroir_peer_pod_count metric
- Metrics: miroir_peer_pod_count, miroir_leader, miroir_owned_shards_count gauges
- Verification script: tests/verify_p6_2_peer_discovery.sh (NixOS-compatible shebang)

All unit tests pass. The implementation requires a Kubernetes deployment
for full acceptance testing (3-pod discovery, scale events, pod eviction).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
jedarden 2026-05-23 02:51:14 -04:00
parent cf9ae11c3a
commit bddfeb366c

View file

@ -0,0 +1,69 @@
# P6.2 Peer Discovery Implementation Verification
## Summary
Verified that peer discovery per plan §14.5 is fully implemented:
### 1. Helm Templates ✓
- `charts/miroir/templates/miroir-headless.yaml` - Headless Service with label selector
- `charts/miroir/templates/miroir-deployment.yaml` - POD_NAME, POD_NAMESPACE, POD_IP env vars via Downward API
### 2. Rust Implementation ✓
- `crates/miroir-core/src/peer_discovery.rs` - SRV-based peer discovery module
- `PeerSet` struct with `peers: Vec<PeerId>` and `refreshed_at: Instant`
- `PeerDiscovery::refresh()` method for SRV lookup
- Feature flag: `peer-discovery` (uses `trust-dns-resolver`)
### 3. Configuration ✓
- `crates/miroir-core/src/config.rs` - `PeerDiscoveryConfig` struct
- `service_name: "miroir-headless"` (default)
- `refresh_interval_s: 15` (default)
- `charts/miroir/values.yaml` - Config section with same defaults
### 4. Main Loop Integration ✓
- `crates/miroir-proxy/src/main.rs` (lines 407-438)
- Creates `PeerDiscovery` instance when POD_NAME is set
- Spawns background refresh loop with configurable interval
- Calls `metrics.set_peer_pod_count(count)` on successful refresh
### 5. Metrics ✓
- `crates/miroir-proxy/src/middleware.rs` (line 823-825, 1582-1584)
- `miroir_peer_pod_count` gauge metric
- `miroir_leader` gauge metric
- `miroir_owned_shards_count` gauge metric
### 6. Verification Script ✓
- `tests/verify_p6_2_peer_discovery.sh` - Checks metrics and env vars
- Shebang: `#!/usr/bin/env bash` (NixOS compatible)
## Acceptance Tests (require K8s environment)
The following acceptance tests require a real Kubernetes deployment:
1. **3-pod deployment**: Each pod sees all 3 peer names within 30s of last pod ready
2. **Scale 3→5**: New peers discovered within `refresh_interval_s × 2`
3. **Pod eviction**: Crashed pod drops from peer set within `refresh_interval_s × 2`
4. **Metric verification**: `miroir_peer_pod_count` matches `kube_deployment_status_replicas_ready`
## Unit Tests
All peer discovery unit tests pass:
- `test_peer_set_empty`
- `test_peer_set_with_peers`
- `test_srv_target_pod_name_extraction`
## Implementation Notes
- The peer discovery implementation was already complete in the codebase
- No code changes were required - this task was verification-only
- The `peer-discovery` feature flag must be enabled for SRV lookups to work
- Peer discovery automatically disables when `POD_NAME=unknown` (local dev)
## Plan §14.5 Alignment
Fully implements plan §14.5 "Peer discovery" with:
- Headless Service SRV lookup mechanism
- 15-second refresh interval (configurable)
- Zero-config operation (uses Downward API env vars)
- No K8s API calls from pods
- Transient double-work is acceptable (idempotent operations)