zai-proxy/docs/notes/DEPLOYMENT.md
jedarden e7c24a0c08 feat: initial zai-proxy ecosystem repo
Extracted from ardenone-cluster/containers/zai-proxy and
ardenone-cluster/containers/zai-proxy-dashboard.

- proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0)
  - Token counting, rate limiting, Prometheus metrics, canary support
- dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0)
  - Prometheus collector, SQLite storage, SSE live updates
- docs/: Operational notes, research, and plan subdirs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:53:52 -04:00

453 lines
14 KiB
Markdown

# Z.AI Proxy - Dual Deployment Workflow Guide
This guide covers managing workers across production and canary deployments of the zai-proxy service.
## Overview
The zai-proxy service supports **dual deployment mode** for safe testing and gradual rollout:
| Deployment | Service Name | Purpose | Endpoint URL |
|------------|--------------|---------|--------------|
| **Production** | `zai-proxy.mcp.svc.cluster.local:8080` | Live traffic | `http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic` |
| **Canary** | `zai-proxy-test.mcp.svc.cluster.local:8080` | Testing new versions | `http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic` |
| **Split Traffic** | `zai-proxy-canary.mcp.svc.cluster.local:8080` | Weighted traffic split | `http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic` |
## Architecture
```
┌─────────────────────────────────────┐
│ Workers (claude-code-glm) │
└──────────────┬──────────────────────┘
┌──────────────▼──────────────────────┐
│ ANTHROPIC_BASE_URL Setting │
│ (selects production or canary) │
└──────────────┬──────────────────────┘
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────┐ ┌──────────────┐ ┌─────────────────────┐
│ Production │ │ Canary │ │ Split Traffic │
│ zai-proxy:8080 │ │zai-proxy-test│ │ zai-proxy-canary │
│ (variant=prod) │ │ (variant=test)│ │ (weighted by pods) │
└───────────────────┘ └──────────────┘ └─────────────────────┘
```
## 1. Configuring Workers for Production vs Canary
### Method 1: Direct Endpoint Configuration
Workers are configured via their `settings.json` file in the agent directory.
**Production Endpoint (Default):**
```json
{
"env": {
"ANTHROPIC_BASE_URL": "http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
}
}
```
**Canary Endpoint:**
```json
{
"env": {
"ANTHROPIC_BASE_URL": "http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic"
}
}
```
**Split Traffic Endpoint:**
```json
{
"env": {
"ANTHROPIC_BASE_URL": "http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic"
}
}
```
### Method 2: Override via Environment Variable
When launching workers, override the endpoint without modifying settings:
```bash
# Launch worker with canary endpoint
export ANTHROPIC_BASE_URL="http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh claude-glm-canary-test
# Launch worker with production endpoint
export ANTHROPIC_BASE_URL="http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh claude-glm-prod
```
### Example: claude-code-glm-47 Configuration
**Location:** `/home/coder/claude-config/agents/claude-code-glm-47/settings.json`
**Current (Production):**
```json
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "proxy-handles-auth",
"ANTHROPIC_BASE_URL": "http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.7",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-4.7",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7"
}
}
```
## 2. Verifying Worker Endpoint Configuration
### Check Active Worker Configuration
```bash
# Attach to a worker session
tmux attach -t claude-code-glm-47-alpha
# Inside the session, check environment
echo $ANTHROPIC_BASE_URL
# Expected output for production:
# http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic
# Check settings.json
cat $CLAUDE_CONFIG_DIR/settings.json | grep ANTHROPIC_BASE_URL
```
### Verify Service Availability
```bash
# Test production endpoint
curl -s http://zai-proxy.mcp.svc.cluster.local:8080/health
# Expected: {"status":"ok"}
# Test canary endpoint
curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/health
# Expected: {"status":"ok"}
# Test split traffic endpoint
curl -s http://zai-proxy-canary.mcp.svc.cluster.local:8080/health
# Expected: {"status":"ok"}
```
### Check Which Pods Worker Is Using
```bash
# From within a devpod, check DNS resolution
nslookup zai-proxy.mcp.svc.cluster.local
nslookup zai-proxy-test.mcp.svc.cluster.local
# Check service endpoints
kubectl get endpoints -n mcp | grep zai-proxy
```
### Monitor Metrics with Deployment Variant Labels
The proxy exports metrics with the `deployment_variant` label:
```promql
# Check requests per variant
sum by (deployment_variant) (rate(zai_proxy_requests_total[5m]))
# Check token usage per variant
sum by (deployment_variant) (rate(zai_proxy_tokens_total[5m]))
```
## 3. Testing Procedure: Canary Deployments
### Step 1: Launch Test Worker Against Canary
```bash
cd /home/coder/claude-config
# Launch worker configured for canary
export ANTHROPIC_BASE_URL="http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh claude-glm-canary-test
```
### Step 2: Verify Canary Configuration
```bash
# Attach to verify
tmux attach -t claude-glm-canary-test
# In the worker session, verify
echo "Using endpoint: $ANTHROPIC_BASE_URL"
# Should show: http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic
# Detach: Ctrl+B, D
```
### Step 3: Run Test Tasks
```bash
# In the worker session or via bead assignment
# Test simple task
br create "Test canary endpoint connectivity" \
--description "Verify worker can successfully make API calls through canary endpoint" \
--labels testing,canary
# Worker should process this and make API calls through canary
```
### Step 4: Monitor Canary Metrics
```bash
# Check canary proxy metrics
curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_requests_total
# Verify variant label in metrics
curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics | grep deployment_variant
# Should show: deployment_variant="canary"
```
### Step 5: Verify Functionality
**Checklist:**
- [ ] Worker makes successful API calls through canary
- [ ] Token counting works (check logs for "Token usage")
- [ ] No errors in worker logs
- [ ] Metrics show `deployment_variant="canary"`
- [ ] Response times are acceptable
## 4. Migration Checklist: Production to Canary
### Pre-Migration
- [ ] **Verify canary deployment is healthy**
```bash
kubectl get pods -n mcp -l app=zai-proxy,variant=test
kubectl logs -n mcp deployment/zai-proxy-test --tail=50
```
- [ ] **Run smoke tests on canary endpoint**
```bash
curl -X POST http://zai-proxy-test.mcp.svc.cluster.local:8080/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ZAI_API_KEY" \
-d '{"model":"claude-3-sonnet","messages":[{"role":"user","content":"test"}],"max_tokens":10}'
```
- [ ] **Review canary metrics for baseline**
```bash
curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics
```
### Migration
- [ ] **Stop workers using production endpoint**
```bash
# List active workers
tlist
# Kill production workers (one by one or all)
tkill claude-code-glm-47-alpha
tkill claude-code-glm-47-bravo
```
- [ ] **Update worker settings.json to use canary**
```bash
# Backup current config
cp /home/coder/claude-config/agents/claude-code-glm-47/settings.json \
/home/coder/claude-config/agents/claude-code-glm-47/settings.json.bak
# Update ANTHROPIC_BASE_URL (use Edit tool or manual edit)
# Change from: http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic
# Change to: http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic
```
- [ ] **Launch workers with new canary configuration**
```bash
cd /home/coder/claude-config
./scripts/spawn-workers.sh --workspace=/path/to/project --executor=claude-code-glm-47 --workers=3
```
### Post-Migration Verification
- [ ] **Verify workers are using canary endpoint**
```bash
# Attach to each worker and check
tmux attach -t <worker-name>
echo $ANTHROPIC_BASE_URL
```
- [ ] **Monitor canary metrics for increased load**
```promql
rate(zai_proxy_requests_total{deployment_variant="canary"}[5m])
```
- [ ] **Check worker logs for errors**
```bash
tail -f ~/.beads-workers/*.log
```
- [ ] **Verify production metrics show decreased load**
```promql
rate(zai_proxy_requests_total{deployment_variant="production"}[5m])
```
## 5. Emergency Fallback to Production
### Quick Fallback (Single Worker)
```bash
# Kill worker using problematic endpoint
tkill <worker-name>
# Relaunch with production endpoint override
export ANTHROPIC_BASE_URL="http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
cd /home/coder/claude-config
./agents/claude-code-glm-47/launch.sh <worker-name>-fallback
```
### Bulk Fallback (All Workers)
```bash
# 1. Kill all affected workers
tkill \$(tmux list-sessions | grep claude-code-glm | cut -d: -f1)
# 2. Restore production settings from backup
cp /home/coder/claude-config/agents/claude-code-glm-47/settings.json.bak \
/home/coder/claude-config/agents/claude-code-glm-47/settings.json
# 3. Relaunch all workers
cd /home/coder/claude-config
./scripts/spawn-workers.sh --workspace=/path/to/project --executor=claude-code-glm-47 --workers=3
```
### Temporary Override (Without Config Change)
```bash
# For immediate testing or debugging
export ANTHROPIC_BASE_URL="http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh emergency-prod-worker
```
## 6. Traffic Splitting (Gradual Rollout)
For gradual rollout, use the `zai-proxy-canary` service which splits traffic based on replica counts:
### Configure Traffic Split
```bash
# Current configuration: 90% production, 10% canary
# 9 production pods + 1 canary pod = 90/10 split
# To change to 50/50 split:
kubectl scale deployment/zai-proxy -n mcp --replicas=5
kubectl scale deployment/zai-proxy-test -n mcp --replicas=5
# To change to 100% canary (full cutover):
kubectl scale deployment/zai-proxy -n mcp --replicas=0
kubectl scale deployment/zai-proxy-test -n mcp --replicas=10
```
### Configure Workers for Split Traffic
```json
{
"env": {
"ANTHROPIC_BASE_URL": "http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic"
}
}
```
Workers will then receive a mix of production and canary responses based on the traffic split.
## 7. Monitoring and Troubleshooting
### Check Deployment Status
```bash
# Production pods
kubectl get pods -n mcp -l app=zai-proxy,variant=production
# Canary pods
kubectl get pods -n mcp -l app=zai-proxy,variant=test
# All zai-proxy pods
kubectl get pods -n mcp -l app=zai-proxy
```
### View Logs
```bash
# Production logs
kubectl logs -f -n mcp deployment/zai-proxy
# Canary logs
kubectl logs -f -n mcp deployment/zai-proxy-test
# Worker logs
tail -f ~/.beads-workers/<worker-name>.log
```
### Prometheus Queries
```promql
# Request rate by deployment variant
sum by (deployment_variant) (
rate(zai_proxy_requests_total[5m])
)
# Error rate by deployment variant
sum by (deployment_variant) (
rate(zai_proxy_requests_total{status=~"5.."}[5m])
)
# Token usage by deployment variant
sum by (deployment_variant) (
rate(zai_proxy_tokens_total[5m])
)
# P95 latency by deployment variant
histogram_quantile(0.95,
sum by (deployment_variant, le) (
rate(zai_proxy_request_duration_seconds_bucket[5m])
)
)
```
### Common Issues
**Issue: Workers getting connection errors**
- Verify service is running: `kubectl get svc -n mcp | grep zai-proxy`
- Check DNS resolution from devpod: `nslookup zai-proxy.mcp.svc.cluster.local`
- Verify endpoint URL in worker settings
**Issue: Workers using wrong deployment**
- Check `ANTHROPIC_BASE_URL` in worker session
- Verify settings.json configuration
- Look for environment variable overrides
**Issue: High error rate on canary**
- Check canary deployment logs: `kubectl logs -n mcp deployment/zai-proxy-test`
- Compare metrics between production and canary
- Consider rollback to production
## 8. Reference: Service Endpoints
| Service | URL | Use Case |
|---------|-----|----------|
| Production | `http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic` | All production traffic |
| Canary | `http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic` | Testing new versions |
| Split | `http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic` | Weighted traffic splitting |
| Metrics (Prod) | `http://zai-proxy.mcp.svc.cluster.local:8080/metrics` | Production metrics |
| Metrics (Canary) | `http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics` | Canary metrics |
| Health (Prod) | `http://zai-proxy.mcp.svc.cluster.local:8080/health` | Production health check |
| Health (Canary) | `http://zai-proxy-test.mcp.svc.cluster.local:8080/health` | Canary health check |
## Summary
1. **Configure** workers via `settings.json` or environment variable
2. **Verify** endpoint configuration before launching workers
3. **Test** canary endpoint with isolated workers first
4. **Migrate** gradually using traffic split or full cutover
5. **Monitor** metrics for both deployments during migration
6. **Fallback** to production if issues arise
For questions or issues, check:
- Worker logs: `~/.beads-workers/*.log`
- Proxy logs: `kubectl logs -n mcp deployment/zai-proxy*`
- Metrics: Service `/metrics` endpoints