Extracted from ardenone-cluster/containers/zai-proxy and ardenone-cluster/containers/zai-proxy-dashboard. - proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0) - Token counting, rate limiting, Prometheus metrics, canary support - dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0) - Prometheus collector, SQLite storage, SSE live updates - docs/: Operational notes, research, and plan subdirs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
14 KiB
Z.AI Proxy - Dual Deployment Workflow Guide
This guide covers managing workers across production and canary deployments of the zai-proxy service.
Overview
The zai-proxy service supports dual deployment mode for safe testing and gradual rollout:
| Deployment | Service Name | Purpose | Endpoint URL |
|---|---|---|---|
| Production | zai-proxy.mcp.svc.cluster.local:8080 |
Live traffic | http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic |
| Canary | zai-proxy-test.mcp.svc.cluster.local:8080 |
Testing new versions | http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic |
| Split Traffic | zai-proxy-canary.mcp.svc.cluster.local:8080 |
Weighted traffic split | http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic |
Architecture
┌─────────────────────────────────────┐
│ Workers (claude-code-glm) │
└──────────────┬──────────────────────┘
│
┌──────────────▼──────────────────────┐
│ ANTHROPIC_BASE_URL Setting │
│ (selects production or canary) │
└──────────────┬──────────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────┐ ┌──────────────┐ ┌─────────────────────┐
│ Production │ │ Canary │ │ Split Traffic │
│ zai-proxy:8080 │ │zai-proxy-test│ │ zai-proxy-canary │
│ (variant=prod) │ │ (variant=test)│ │ (weighted by pods) │
└───────────────────┘ └──────────────┘ └─────────────────────┘
1. Configuring Workers for Production vs Canary
Method 1: Direct Endpoint Configuration
Workers are configured via their settings.json file in the agent directory.
Production Endpoint (Default):
{
"env": {
"ANTHROPIC_BASE_URL": "http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
}
}
Canary Endpoint:
{
"env": {
"ANTHROPIC_BASE_URL": "http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic"
}
}
Split Traffic Endpoint:
{
"env": {
"ANTHROPIC_BASE_URL": "http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic"
}
}
Method 2: Override via Environment Variable
When launching workers, override the endpoint without modifying settings:
# Launch worker with canary endpoint
export ANTHROPIC_BASE_URL="http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh claude-glm-canary-test
# Launch worker with production endpoint
export ANTHROPIC_BASE_URL="http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh claude-glm-prod
Example: claude-code-glm-47 Configuration
Location: /home/coder/claude-config/agents/claude-code-glm-47/settings.json
Current (Production):
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "proxy-handles-auth",
"ANTHROPIC_BASE_URL": "http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.7",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-4.7",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7"
}
}
2. Verifying Worker Endpoint Configuration
Check Active Worker Configuration
# Attach to a worker session
tmux attach -t claude-code-glm-47-alpha
# Inside the session, check environment
echo $ANTHROPIC_BASE_URL
# Expected output for production:
# http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic
# Check settings.json
cat $CLAUDE_CONFIG_DIR/settings.json | grep ANTHROPIC_BASE_URL
Verify Service Availability
# Test production endpoint
curl -s http://zai-proxy.mcp.svc.cluster.local:8080/health
# Expected: {"status":"ok"}
# Test canary endpoint
curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/health
# Expected: {"status":"ok"}
# Test split traffic endpoint
curl -s http://zai-proxy-canary.mcp.svc.cluster.local:8080/health
# Expected: {"status":"ok"}
Check Which Pods Worker Is Using
# From within a devpod, check DNS resolution
nslookup zai-proxy.mcp.svc.cluster.local
nslookup zai-proxy-test.mcp.svc.cluster.local
# Check service endpoints
kubectl get endpoints -n mcp | grep zai-proxy
Monitor Metrics with Deployment Variant Labels
The proxy exports metrics with the deployment_variant label:
# Check requests per variant
sum by (deployment_variant) (rate(zai_proxy_requests_total[5m]))
# Check token usage per variant
sum by (deployment_variant) (rate(zai_proxy_tokens_total[5m]))
3. Testing Procedure: Canary Deployments
Step 1: Launch Test Worker Against Canary
cd /home/coder/claude-config
# Launch worker configured for canary
export ANTHROPIC_BASE_URL="http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh claude-glm-canary-test
Step 2: Verify Canary Configuration
# Attach to verify
tmux attach -t claude-glm-canary-test
# In the worker session, verify
echo "Using endpoint: $ANTHROPIC_BASE_URL"
# Should show: http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic
# Detach: Ctrl+B, D
Step 3: Run Test Tasks
# In the worker session or via bead assignment
# Test simple task
br create "Test canary endpoint connectivity" \
--description "Verify worker can successfully make API calls through canary endpoint" \
--labels testing,canary
# Worker should process this and make API calls through canary
Step 4: Monitor Canary Metrics
# Check canary proxy metrics
curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_requests_total
# Verify variant label in metrics
curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics | grep deployment_variant
# Should show: deployment_variant="canary"
Step 5: Verify Functionality
Checklist:
- Worker makes successful API calls through canary
- Token counting works (check logs for "Token usage")
- No errors in worker logs
- Metrics show
deployment_variant="canary" - Response times are acceptable
4. Migration Checklist: Production to Canary
Pre-Migration
-
Verify canary deployment is healthy
kubectl get pods -n mcp -l app=zai-proxy,variant=test kubectl logs -n mcp deployment/zai-proxy-test --tail=50 -
Run smoke tests on canary endpoint
curl -X POST http://zai-proxy-test.mcp.svc.cluster.local:8080/v1/messages \ -H "Content-Type: application/json" \ -H "x-api-key: $ZAI_API_KEY" \ -d '{"model":"claude-3-sonnet","messages":[{"role":"user","content":"test"}],"max_tokens":10}' -
Review canary metrics for baseline
curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics
Migration
-
Stop workers using production endpoint
# List active workers tlist # Kill production workers (one by one or all) tkill claude-code-glm-47-alpha tkill claude-code-glm-47-bravo -
Update worker settings.json to use canary
# Backup current config cp /home/coder/claude-config/agents/claude-code-glm-47/settings.json \ /home/coder/claude-config/agents/claude-code-glm-47/settings.json.bak # Update ANTHROPIC_BASE_URL (use Edit tool or manual edit) # Change from: http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic # Change to: http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic -
Launch workers with new canary configuration
cd /home/coder/claude-config ./scripts/spawn-workers.sh --workspace=/path/to/project --executor=claude-code-glm-47 --workers=3
Post-Migration Verification
-
Verify workers are using canary endpoint
# Attach to each worker and check tmux attach -t <worker-name> echo $ANTHROPIC_BASE_URL -
Monitor canary metrics for increased load
rate(zai_proxy_requests_total{deployment_variant="canary"}[5m]) -
Check worker logs for errors
tail -f ~/.beads-workers/*.log -
Verify production metrics show decreased load
rate(zai_proxy_requests_total{deployment_variant="production"}[5m])
5. Emergency Fallback to Production
Quick Fallback (Single Worker)
# Kill worker using problematic endpoint
tkill <worker-name>
# Relaunch with production endpoint override
export ANTHROPIC_BASE_URL="http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
cd /home/coder/claude-config
./agents/claude-code-glm-47/launch.sh <worker-name>-fallback
Bulk Fallback (All Workers)
# 1. Kill all affected workers
tkill \$(tmux list-sessions | grep claude-code-glm | cut -d: -f1)
# 2. Restore production settings from backup
cp /home/coder/claude-config/agents/claude-code-glm-47/settings.json.bak \
/home/coder/claude-config/agents/claude-code-glm-47/settings.json
# 3. Relaunch all workers
cd /home/coder/claude-config
./scripts/spawn-workers.sh --workspace=/path/to/project --executor=claude-code-glm-47 --workers=3
Temporary Override (Without Config Change)
# For immediate testing or debugging
export ANTHROPIC_BASE_URL="http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh emergency-prod-worker
6. Traffic Splitting (Gradual Rollout)
For gradual rollout, use the zai-proxy-canary service which splits traffic based on replica counts:
Configure Traffic Split
# Current configuration: 90% production, 10% canary
# 9 production pods + 1 canary pod = 90/10 split
# To change to 50/50 split:
kubectl scale deployment/zai-proxy -n mcp --replicas=5
kubectl scale deployment/zai-proxy-test -n mcp --replicas=5
# To change to 100% canary (full cutover):
kubectl scale deployment/zai-proxy -n mcp --replicas=0
kubectl scale deployment/zai-proxy-test -n mcp --replicas=10
Configure Workers for Split Traffic
{
"env": {
"ANTHROPIC_BASE_URL": "http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic"
}
}
Workers will then receive a mix of production and canary responses based on the traffic split.
7. Monitoring and Troubleshooting
Check Deployment Status
# Production pods
kubectl get pods -n mcp -l app=zai-proxy,variant=production
# Canary pods
kubectl get pods -n mcp -l app=zai-proxy,variant=test
# All zai-proxy pods
kubectl get pods -n mcp -l app=zai-proxy
View Logs
# Production logs
kubectl logs -f -n mcp deployment/zai-proxy
# Canary logs
kubectl logs -f -n mcp deployment/zai-proxy-test
# Worker logs
tail -f ~/.beads-workers/<worker-name>.log
Prometheus Queries
# Request rate by deployment variant
sum by (deployment_variant) (
rate(zai_proxy_requests_total[5m])
)
# Error rate by deployment variant
sum by (deployment_variant) (
rate(zai_proxy_requests_total{status=~"5.."}[5m])
)
# Token usage by deployment variant
sum by (deployment_variant) (
rate(zai_proxy_tokens_total[5m])
)
# P95 latency by deployment variant
histogram_quantile(0.95,
sum by (deployment_variant, le) (
rate(zai_proxy_request_duration_seconds_bucket[5m])
)
)
Common Issues
Issue: Workers getting connection errors
- Verify service is running:
kubectl get svc -n mcp | grep zai-proxy - Check DNS resolution from devpod:
nslookup zai-proxy.mcp.svc.cluster.local - Verify endpoint URL in worker settings
Issue: Workers using wrong deployment
- Check
ANTHROPIC_BASE_URLin worker session - Verify settings.json configuration
- Look for environment variable overrides
Issue: High error rate on canary
- Check canary deployment logs:
kubectl logs -n mcp deployment/zai-proxy-test - Compare metrics between production and canary
- Consider rollback to production
8. Reference: Service Endpoints
| Service | URL | Use Case |
|---|---|---|
| Production | http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic |
All production traffic |
| Canary | http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic |
Testing new versions |
| Split | http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic |
Weighted traffic splitting |
| Metrics (Prod) | http://zai-proxy.mcp.svc.cluster.local:8080/metrics |
Production metrics |
| Metrics (Canary) | http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics |
Canary metrics |
| Health (Prod) | http://zai-proxy.mcp.svc.cluster.local:8080/health |
Production health check |
| Health (Canary) | http://zai-proxy-test.mcp.svc.cluster.local:8080/health |
Canary health check |
Summary
- Configure workers via
settings.jsonor environment variable - Verify endpoint configuration before launching workers
- Test canary endpoint with isolated workers first
- Migrate gradually using traffic split or full cutover
- Monitor metrics for both deployments during migration
- Fallback to production if issues arise
For questions or issues, check:
- Worker logs:
~/.beads-workers/*.log - Proxy logs:
kubectl logs -n mcp deployment/zai-proxy* - Metrics: Service
/metricsendpoints