zai-proxy/docs/notes/DEPLOYMENT.md
jedarden e7c24a0c08 feat: initial zai-proxy ecosystem repo
Extracted from ardenone-cluster/containers/zai-proxy and
ardenone-cluster/containers/zai-proxy-dashboard.

- proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0)
  - Token counting, rate limiting, Prometheus metrics, canary support
- dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0)
  - Prometheus collector, SQLite storage, SSE live updates
- docs/: Operational notes, research, and plan subdirs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:53:52 -04:00

14 KiB

Z.AI Proxy - Dual Deployment Workflow Guide

This guide covers managing workers across production and canary deployments of the zai-proxy service.

Overview

The zai-proxy service supports dual deployment mode for safe testing and gradual rollout:

Deployment Service Name Purpose Endpoint URL
Production zai-proxy.mcp.svc.cluster.local:8080 Live traffic http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic
Canary zai-proxy-test.mcp.svc.cluster.local:8080 Testing new versions http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic
Split Traffic zai-proxy-canary.mcp.svc.cluster.local:8080 Weighted traffic split http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic

Architecture

                    ┌─────────────────────────────────────┐
                    │       Workers (claude-code-glm)      │
                    └──────────────┬──────────────────────┘
                                   │
                    ┌──────────────▼──────────────────────┐
                    │      ANTHROPIC_BASE_URL Setting      │
                    │  (selects production or canary)      │
                    └──────────────┬──────────────────────┘
                                   │
                ┌──────────────────┼──────────────────┐
                │                  │                  │
                ▼                  ▼                  ▼
    ┌───────────────────┐ ┌──────────────┐ ┌─────────────────────┐
    │   Production      │ │   Canary     │ │  Split Traffic      │
    │  zai-proxy:8080   │ │zai-proxy-test│ │ zai-proxy-canary    │
    │  (variant=prod)   │ │ (variant=test)│ │ (weighted by pods)  │
    └───────────────────┘ └──────────────┘ └─────────────────────┘

1. Configuring Workers for Production vs Canary

Method 1: Direct Endpoint Configuration

Workers are configured via their settings.json file in the agent directory.

Production Endpoint (Default):

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
  }
}

Canary Endpoint:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic"
  }
}

Split Traffic Endpoint:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic"
  }
}

Method 2: Override via Environment Variable

When launching workers, override the endpoint without modifying settings:

# Launch worker with canary endpoint
export ANTHROPIC_BASE_URL="http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh claude-glm-canary-test

# Launch worker with production endpoint
export ANTHROPIC_BASE_URL="http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh claude-glm-prod

Example: claude-code-glm-47 Configuration

Location: /home/coder/claude-config/agents/claude-code-glm-47/settings.json

Current (Production):

{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "proxy-handles-auth",
    "ANTHROPIC_BASE_URL": "http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.7",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-4.7",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7"
  }
}

2. Verifying Worker Endpoint Configuration

Check Active Worker Configuration

# Attach to a worker session
tmux attach -t claude-code-glm-47-alpha

# Inside the session, check environment
echo $ANTHROPIC_BASE_URL
# Expected output for production:
# http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic

# Check settings.json
cat $CLAUDE_CONFIG_DIR/settings.json | grep ANTHROPIC_BASE_URL

Verify Service Availability

# Test production endpoint
curl -s http://zai-proxy.mcp.svc.cluster.local:8080/health
# Expected: {"status":"ok"}

# Test canary endpoint
curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/health
# Expected: {"status":"ok"}

# Test split traffic endpoint
curl -s http://zai-proxy-canary.mcp.svc.cluster.local:8080/health
# Expected: {"status":"ok"}

Check Which Pods Worker Is Using

# From within a devpod, check DNS resolution
nslookup zai-proxy.mcp.svc.cluster.local
nslookup zai-proxy-test.mcp.svc.cluster.local

# Check service endpoints
kubectl get endpoints -n mcp | grep zai-proxy

Monitor Metrics with Deployment Variant Labels

The proxy exports metrics with the deployment_variant label:

# Check requests per variant
sum by (deployment_variant) (rate(zai_proxy_requests_total[5m]))

# Check token usage per variant
sum by (deployment_variant) (rate(zai_proxy_tokens_total[5m]))

3. Testing Procedure: Canary Deployments

Step 1: Launch Test Worker Against Canary

cd /home/coder/claude-config

# Launch worker configured for canary
export ANTHROPIC_BASE_URL="http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh claude-glm-canary-test

Step 2: Verify Canary Configuration

# Attach to verify
tmux attach -t claude-glm-canary-test

# In the worker session, verify
echo "Using endpoint: $ANTHROPIC_BASE_URL"
# Should show: http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic

# Detach: Ctrl+B, D

Step 3: Run Test Tasks

# In the worker session or via bead assignment
# Test simple task
br create "Test canary endpoint connectivity" \
  --description "Verify worker can successfully make API calls through canary endpoint" \
  --labels testing,canary

# Worker should process this and make API calls through canary

Step 4: Monitor Canary Metrics

# Check canary proxy metrics
curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_requests_total

# Verify variant label in metrics
curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics | grep deployment_variant
# Should show: deployment_variant="canary"

Step 5: Verify Functionality

Checklist:

  • Worker makes successful API calls through canary
  • Token counting works (check logs for "Token usage")
  • No errors in worker logs
  • Metrics show deployment_variant="canary"
  • Response times are acceptable

4. Migration Checklist: Production to Canary

Pre-Migration

  • Verify canary deployment is healthy

    kubectl get pods -n mcp -l app=zai-proxy,variant=test
    kubectl logs -n mcp deployment/zai-proxy-test --tail=50
    
  • Run smoke tests on canary endpoint

    curl -X POST http://zai-proxy-test.mcp.svc.cluster.local:8080/v1/messages \
      -H "Content-Type: application/json" \
      -H "x-api-key: $ZAI_API_KEY" \
      -d '{"model":"claude-3-sonnet","messages":[{"role":"user","content":"test"}],"max_tokens":10}'
    
  • Review canary metrics for baseline

    curl -s http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics
    

Migration

  • Stop workers using production endpoint

    # List active workers
    tlist
    
    # Kill production workers (one by one or all)
    tkill claude-code-glm-47-alpha
    tkill claude-code-glm-47-bravo
    
  • Update worker settings.json to use canary

    # Backup current config
    cp /home/coder/claude-config/agents/claude-code-glm-47/settings.json \
       /home/coder/claude-config/agents/claude-code-glm-47/settings.json.bak
    
    # Update ANTHROPIC_BASE_URL (use Edit tool or manual edit)
    # Change from: http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic
    # Change to:   http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic
    
  • Launch workers with new canary configuration

    cd /home/coder/claude-config
    ./scripts/spawn-workers.sh --workspace=/path/to/project --executor=claude-code-glm-47 --workers=3
    

Post-Migration Verification

  • Verify workers are using canary endpoint

    # Attach to each worker and check
    tmux attach -t <worker-name>
    echo $ANTHROPIC_BASE_URL
    
  • Monitor canary metrics for increased load

    rate(zai_proxy_requests_total{deployment_variant="canary"}[5m])
    
  • Check worker logs for errors

    tail -f ~/.beads-workers/*.log
    
  • Verify production metrics show decreased load

    rate(zai_proxy_requests_total{deployment_variant="production"}[5m])
    

5. Emergency Fallback to Production

Quick Fallback (Single Worker)

# Kill worker using problematic endpoint
tkill <worker-name>

# Relaunch with production endpoint override
export ANTHROPIC_BASE_URL="http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
cd /home/coder/claude-config
./agents/claude-code-glm-47/launch.sh <worker-name>-fallback

Bulk Fallback (All Workers)

# 1. Kill all affected workers
tkill \$(tmux list-sessions | grep claude-code-glm | cut -d: -f1)

# 2. Restore production settings from backup
cp /home/coder/claude-config/agents/claude-code-glm-47/settings.json.bak \
   /home/coder/claude-config/agents/claude-code-glm-47/settings.json

# 3. Relaunch all workers
cd /home/coder/claude-config
./scripts/spawn-workers.sh --workspace=/path/to/project --executor=claude-code-glm-47 --workers=3

Temporary Override (Without Config Change)

# For immediate testing or debugging
export ANTHROPIC_BASE_URL="http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic"
./agents/claude-code-glm-47/launch.sh emergency-prod-worker

6. Traffic Splitting (Gradual Rollout)

For gradual rollout, use the zai-proxy-canary service which splits traffic based on replica counts:

Configure Traffic Split

# Current configuration: 90% production, 10% canary
# 9 production pods + 1 canary pod = 90/10 split

# To change to 50/50 split:
kubectl scale deployment/zai-proxy -n mcp --replicas=5
kubectl scale deployment/zai-proxy-test -n mcp --replicas=5

# To change to 100% canary (full cutover):
kubectl scale deployment/zai-proxy -n mcp --replicas=0
kubectl scale deployment/zai-proxy-test -n mcp --replicas=10

Configure Workers for Split Traffic

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic"
  }
}

Workers will then receive a mix of production and canary responses based on the traffic split.

7. Monitoring and Troubleshooting

Check Deployment Status

# Production pods
kubectl get pods -n mcp -l app=zai-proxy,variant=production

# Canary pods
kubectl get pods -n mcp -l app=zai-proxy,variant=test

# All zai-proxy pods
kubectl get pods -n mcp -l app=zai-proxy

View Logs

# Production logs
kubectl logs -f -n mcp deployment/zai-proxy

# Canary logs
kubectl logs -f -n mcp deployment/zai-proxy-test

# Worker logs
tail -f ~/.beads-workers/<worker-name>.log

Prometheus Queries

# Request rate by deployment variant
sum by (deployment_variant) (
  rate(zai_proxy_requests_total[5m])
)

# Error rate by deployment variant
sum by (deployment_variant) (
  rate(zai_proxy_requests_total{status=~"5.."}[5m])
)

# Token usage by deployment variant
sum by (deployment_variant) (
  rate(zai_proxy_tokens_total[5m])
)

# P95 latency by deployment variant
histogram_quantile(0.95,
  sum by (deployment_variant, le) (
    rate(zai_proxy_request_duration_seconds_bucket[5m])
  )
)

Common Issues

Issue: Workers getting connection errors

  • Verify service is running: kubectl get svc -n mcp | grep zai-proxy
  • Check DNS resolution from devpod: nslookup zai-proxy.mcp.svc.cluster.local
  • Verify endpoint URL in worker settings

Issue: Workers using wrong deployment

  • Check ANTHROPIC_BASE_URL in worker session
  • Verify settings.json configuration
  • Look for environment variable overrides

Issue: High error rate on canary

  • Check canary deployment logs: kubectl logs -n mcp deployment/zai-proxy-test
  • Compare metrics between production and canary
  • Consider rollback to production

8. Reference: Service Endpoints

Service URL Use Case
Production http://zai-proxy.devpod.svc.cluster.local:8080/api/anthropic All production traffic
Canary http://zai-proxy-test.devpod.svc.cluster.local:8080/api/anthropic Testing new versions
Split http://zai-proxy-canary.devpod.svc.cluster.local:8080/api/anthropic Weighted traffic splitting
Metrics (Prod) http://zai-proxy.mcp.svc.cluster.local:8080/metrics Production metrics
Metrics (Canary) http://zai-proxy-test.mcp.svc.cluster.local:8080/metrics Canary metrics
Health (Prod) http://zai-proxy.mcp.svc.cluster.local:8080/health Production health check
Health (Canary) http://zai-proxy-test.mcp.svc.cluster.local:8080/health Canary health check

Summary

  1. Configure workers via settings.json or environment variable
  2. Verify endpoint configuration before launching workers
  3. Test canary endpoint with isolated workers first
  4. Migrate gradually using traffic split or full cutover
  5. Monitor metrics for both deployments during migration
  6. Fallback to production if issues arise

For questions or issues, check:

  • Worker logs: ~/.beads-workers/*.log
  • Proxy logs: kubectl logs -n mcp deployment/zai-proxy*
  • Metrics: Service /metrics endpoints