Extracted from ardenone-cluster/containers/zai-proxy and ardenone-cluster/containers/zai-proxy-dashboard. - proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0) - Token counting, rate limiting, Prometheus metrics, canary support - dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0) - Prometheus collector, SQLite storage, SSE live updates - docs/: Operational notes, research, and plan subdirs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
116 lines
3 KiB
Markdown
116 lines
3 KiB
Markdown
# Z.AI Proxy Blue-Green Deployment - Traffic Switchover
|
|
|
|
## Current Status
|
|
|
|
- **V1 (Old)**: `zai-proxy` deployment running `ronaldraygun/zai-proxy:1.1.0`
|
|
- **V2 (New)**: `zai-proxy-v2` deployment running `ronaldraygun/zai-proxy:1.3.0`
|
|
- **Service**: Currently routes to V1 (`selector: app=zai-proxy` without version label)
|
|
|
|
## Switchover Procedure
|
|
|
|
### Step 1: Verify V2 is Running and Healthy
|
|
|
|
```bash
|
|
kubectl get deployment zai-proxy-v2 -n devpod
|
|
kubectl get pods -n devpod -l version=v2
|
|
kubectl logs -n devpod -l version=v2 --tail=20
|
|
|
|
# Test V2 directly (bypass service)
|
|
POD_IP=$(kubectl get pod -n devpod -l version=v2 -o jsonpath='{.items[0].status.podIP}')
|
|
curl http://$POD_IP:8080/health
|
|
curl http://$POD_IP:8080/metrics | grep zai_proxy_rate_limit
|
|
```
|
|
|
|
### Step 2: Update Service Selector to Route to V2
|
|
|
|
```bash
|
|
kubectl patch service zai-proxy -n devpod --type=merge -p '
|
|
{
|
|
"spec": {
|
|
"selector": {
|
|
"app": "zai-proxy",
|
|
"version": "v2"
|
|
}
|
|
}
|
|
}'
|
|
```
|
|
|
|
### Step 3: Verify Traffic is Flowing to V2
|
|
|
|
```bash
|
|
# Check service endpoints
|
|
kubectl get endpoints zai-proxy -n devpod
|
|
|
|
# Test through service
|
|
curl http://zai-proxy.devpod.svc.cluster.local:8080/health
|
|
curl http://zai-proxy.devpod.svc.cluster.local:8080/metrics | grep "deployment_variant"
|
|
|
|
# Should see: deployment_variant="v2"
|
|
```
|
|
|
|
### Step 4: Monitor Metrics in Grafana
|
|
|
|
Check that new metrics are now available:
|
|
- Current Rate Limit
|
|
- Token counting metrics
|
|
- Adaptive rate limit adjustments
|
|
|
|
### Step 5: Delete Old V1 Deployment (Optional - Keep for Rollback)
|
|
|
|
**Option A: Keep V1 for Quick Rollback (Recommended for 24h)**
|
|
```bash
|
|
# Scale V1 to 0 replicas but keep deployment
|
|
kubectl scale deployment zai-proxy -n devpod --replicas=0
|
|
```
|
|
|
|
**Option B: Delete V1 Completely**
|
|
```bash
|
|
kubectl delete deployment zai-proxy -n devpod
|
|
```
|
|
|
|
## Rollback Procedure (If Needed)
|
|
|
|
If V2 has issues, instantly rollback to V1:
|
|
|
|
```bash
|
|
# If V1 is scaled to 0
|
|
kubectl scale deployment zai-proxy -n devpod --replicas=1
|
|
|
|
# Switch service back to V1
|
|
kubectl patch service zai-proxy -n devpod --type=merge -p '
|
|
{
|
|
"spec": {
|
|
"selector": {
|
|
"app": "zai-proxy"
|
|
}
|
|
}
|
|
}'
|
|
|
|
# Or directly update to no version label
|
|
kubectl patch service zai-proxy -n devpod --type=json -p='[
|
|
{"op": "remove", "path": "/spec/selector/version"}
|
|
]'
|
|
```
|
|
|
|
## Benefits of This Approach
|
|
|
|
1. **Zero Downtime**: V2 starts before V1 stops
|
|
2. **Instant Rollback**: Keep V1 running or scaled to 0
|
|
3. **Gradual Verification**: Test V2 directly before switching traffic
|
|
4. **Safe**: Can test without affecting users
|
|
|
|
## Worker Impact
|
|
|
|
- Workers will continue using the proxy without interruption
|
|
- Existing connections may be briefly reset during service selector change
|
|
- Rate limiting will reset to initial values on V2 (RATE_LIMIT_INITIAL=2)
|
|
|
|
## Monitoring Checklist
|
|
|
|
- [ ] V2 pod is Running
|
|
- [ ] V2 health check passes
|
|
- [ ] V2 metrics endpoint accessible
|
|
- [ ] Service endpoints point to V2 pod
|
|
- [ ] Workers can make requests successfully
|
|
- [ ] Grafana shows new metrics
|
|
- [ ] No 429 or 502 errors in V2 logs
|