zai-proxy/docs/notes/MONITORING_SETUP.md
jedarden e7c24a0c08 feat: initial zai-proxy ecosystem repo
Extracted from ardenone-cluster/containers/zai-proxy and
ardenone-cluster/containers/zai-proxy-dashboard.

- proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0)
  - Token counting, rate limiting, Prometheus metrics, canary support
- dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0)
  - Prometheus collector, SQLite storage, SSE live updates
- docs/: Operational notes, research, and plan subdirs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:53:52 -04:00

15 KiB

ZAI-Proxy Monitoring Setup - Dual Deployment

Overview

This document describes the monitoring configuration for zai-proxy dual deployment (production + canary). The setup includes ServiceMonitors for Prometheus scraping, PrometheusRules for alerting, and a Grafana dashboard for visualization.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    apexalgo-iad Cluster                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌──────────────────┐         ┌──────────────────┐             │
│  │  Production      │         │     Canary       │             │
│  │  (mcp namespace) │         │ (devpod namespace)│             │
│  │                  │         │                  │             │
│  │  zai-proxy:1.0.0 │         │zai-proxy:1.2.0   │             │
│  │  TOKEN_COUNTING  │         │TOKEN_COUNTING    │             │
│  │  = false         │         │= true            │             │
│  └────────┬─────────┘         └────────┬─────────┘             │
│           │                             │                        │
│           │ /metrics                    │ /metrics               │
│           │ variant="production"        │ variant="canary"       │
│           ▼                             ▼                        │
│  ┌────────────────────────────────────────────────────┐        │
│  │         Prometheus (monitoring namespace)           │        │
│  │                                                     │        │
│  │  ServiceMonitor: zai-proxy-production              │        │
│  │    selector: app=zai-proxy, version=production     │        │
│  │    namespace: mcp                                  │        │
│  │    relabels: deployment_variant=production         │        │
│  │                                                     │        │
│  │  ServiceMonitor: zai-proxy-canary                  │        │
│  │    selector: app=zai-proxy-canary, version=canary  │        │
│  │    namespace: devpod                               │        │
│  │    relabels: deployment_variant=canary             │        │
│  │                                                     │        │
│  │  PrometheusRules: zai-proxy-canary-alerts          │        │
│  │    - Canaries-specific alerts                      │        │
│  │    - Comparison alerts vs production               │        │
│  └────────────────────────────────────────────────────┘        │
│           │                             │                        │
│           ▼                             ▼                        │
│  ┌────────────────────────────────────────────────────┐        │
│  │          Grafana Dashboard                         │        │
│  │  zai-proxy-dual-deployment.json                    │        │
│  │                                                     │        │
│  │  Panels:                                            │        │
│  │    - Worker Utilization (gauge)                    │        │
│  │    - Request Rate (timeseries)                     │        │
│  │    - Error Rate (timeseries)                       │        │
│  │    - Latency Comparison (P50/P95)                  │        │
│  │    - Current Rate Limit                            │        │
│  │    - Upstream Errors                               │        │
│  │    - Concurrent Requests (gauge)                   │        │
│  │    - Token Throughput (canary only)                │        │
│  │    - Token Counting Duration (canary only)         │        │
│  │    - Rate Limit Adjustments (canary only)          │        │
│  └────────────────────────────────────────────────────┘        │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘

File Structure

k8s/
├── production/
│   ├── deployment.yml       # Production deployment (mcp namespace)
│   └── service.yml          # Production service with version label
├── canary/
│   ├── deployment.yml       # Canary deployment (devpod namespace)
│   └── service.yml          # Canary service with version label
└── monitoring/
    ├── servicemonitor-production.yml    # Prometheus scraping for production
    ├── servicemonitor-canary.yml        # Prometheus scraping for canary
    ├── prometheus-rules.yml             # Canary-specific alerts
    └── grafana-dashboard-configmap.yml  # Grafana dashboard JSON

ServiceMonitor Configuration

Production ServiceMonitor

File: k8s/monitoring/servicemonitor-production.yml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: zai-proxy-production
  namespace: monitoring
  labels:
    app: zai-proxy
    release: kube-prometheus-stack-arde
    variant: production
spec:
  selector:
    matchLabels:
      app: zai-proxy
      version: production
  namespaceSelector:
    matchNames:
    - mcp
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
    relabelings:
    - sourceLabels: [__meta_kubernetes_service_label_version]
      targetLabel: deployment_variant

Canary ServiceMonitor

File: k8s/monitoring/servicemonitor-canary.yml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: zai-proxy-canary
  namespace: monitoring
  labels:
    app: zai-proxy
    release: kube-prometheus-stack-arde
    variant: canary
spec:
  selector:
    matchLabels:
      app: zai-proxy-canary
      version: canary
  namespaceSelector:
    matchNames:
    - devpod
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
    relabelings:
    - sourceLabels: [__meta_kubernetes_service_label_version]
      targetLabel: deployment_variant

Key Points:

  • Both ServiceMonitors add deployment_variant label via relabeling
  • Production scrapes from mcp namespace
  • Canary scrapes from devpod namespace
  • Scrape interval: 30 seconds

Metrics

Application Metrics (from metrics.go)

Metric Name Type Labels Description
zai_proxy_requests_total Counter method, path, status_code, variant Total requests
zai_proxy_request_duration_seconds Histogram method, path, status_code, variant Request latency
zai_proxy_concurrent_requests Gauge variant Active requests
zai_proxy_worker_utilization_ratio Gauge variant Worker utilization %
zai_proxy_rate_limit_requests_per_second Gauge variant Current rate limit
zai_proxy_tokens_total Counter direction, model, variant Token counts
zai_proxy_token_count_duration_seconds Histogram variant Token counting time
zai_proxy_build_info Gauge version, variant, commit, build_time Build metadata

Label Mapping

Application Label ServiceMonitor Relabel Dashboard Query
variant="production" deployment_variant="production" deployment_variant="production"
variant="canary" deployment_variant="canary" deployment_variant="canary"

Grafana Dashboard

File: k8s/monitoring/grafana-dashboard-configmap.yml

Dashboard: "ZAI Proxy - Production vs Canary"

Panels

  1. Worker Utilization (Gauge)

    • Shows concurrent requests vs max workers
    • Separate gauges for production and canary
  2. Request Rate (Time Series)

    • sum(rate(zai_proxy_requests_total{deployment_variant="production"}[5m]))
    • sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[5m]))
  3. Error Rate (Time Series)

    • 5xx errors as percentage of total requests
    • Separate lines for production and canary
  4. Latency Comparison (Time Series)

    • P50 and P95 percentiles
    • Separate lines for each deployment
  5. Current Rate Limit (Time Series)

    • zai_proxy_rate_limit_requests_per_second
  6. Upstream Errors (Time Series)

    • zai_proxy_upstream_errors_total by error_type
  7. Concurrent Requests (Gauge)

    • Current active requests per deployment
  8. Token Throughput (Time Series) - Canary Only

    • zai_proxy_tokens_total by direction and model
    • Only for canary since production has token counting disabled
  9. Token Counting Duration (Time Series) - Canary Only

    • P95 of zai_proxy_token_count_duration_seconds
  10. Rate Limit Adjustments (Time Series) - Canary Only

    • zai_proxy_rate_limit_adjustments_total by direction

Dashboard Labels

"tags": ["zai-proxy", "canary", "production", "monitoring"]

Prometheus Rules - Canary Alerts

File: k8s/monitoring/prometheus-rules.yml

Alert Rules

Alert Name Severity Condition Duration
ZaiProxyCanaryHighErrorRate warning Error rate > 5% 5min
ZaiProxyCanaryHighLatency warning P95 > 10s 5min
ZaiProxyCanaryCrashLooping critical Restart rate > 0 5min
ZaiProxyCanaryNotReady critical 0 ready pods 2min
ZaiProxyCanaryDegradedVsProduction warning 2x error rate vs production 10min
ZaiProxyCanarySlowerThanProduction warning 50% higher P95 vs production 10min
ZaiProxyCanaryTokenCountingSlow warning Token counting P95 > 100ms 5min
ZaiProxyCanaryRateLimitAdjustingDown info Rate limit decreasing 5min

Alert Examples

High Error Rate:

(
  sum(rate(zai_proxy_requests_total{deployment_variant="canary",status_code=~"5.."}[5m]))
  /
  sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[5m]))
) > 0.05

Degraded vs Production:

(
  sum(rate(zai_proxy_requests_total{deployment_variant="canary",status_code=~"5.."}[10m]))
  /
  sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[10m]))
) > 2 * (
  sum(rate(zai_proxy_requests_total{deployment_variant="production",status_code=~"5.."}[10m]))
  /
  sum(rate(zai_proxy_requests_total{deployment_variant="production"}[10m])) + 0.01
)

Deployment Labels

Production Deployment

# k8s/production/deployment.yml
spec:
  template:
    metadata:
      labels:
        app: zai-proxy
        version: production
    spec:
      containers:
      - name: proxy
        env:
        - name: DEPLOYMENT_VARIANT
          value: "production"
        - name: TOKEN_COUNTING_ENABLED
          value: "false"

Canary Deployment

# k8s/canary/deployment.yml
spec:
  template:
    metadata:
      labels:
        app: zai-proxy-canary
        version: canary
    spec:
      containers:
      - name: proxy
        env:
        - name: DEPLOYMENT_VARIANT
          value: "canary"
        - name: TOKEN_COUNTING_ENABLED
          value: "true"

Verification

Run the verification script to check the monitoring setup:

./scripts/verify-monitoring.sh

This will check:

  • Monitoring namespace exists
  • ServiceMonitors are configured correctly
  • PrometheusRules are deployed
  • Grafana dashboard exists
  • Relabel configs are correct

Manual Metrics Testing

To verify metrics are being exported correctly:

# Production metrics endpoint
kubectl port-forward -n mcp deployment/zai-proxy 8080:8080
curl http://localhost:8080/metrics | grep zai_proxy

# Canary metrics endpoint
kubectl port-forward -n devpod deployment/zai-proxy-canary 8080:8080
curl http://localhost:8080/metrics | grep zai_proxy

Prometheus Queries

Compare request rates

sum(rate(zai_proxy_requests_total{deployment_variant="production"}[5m]))
sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[5m]))

Check token counting (canary only)

sum(rate(zai_proxy_tokens_total{deployment_variant="canary"}[5m])) by (direction, model)

Compare error rates

sum(rate(zai_proxy_requests_total{deployment_variant="production",status_code=~"5.."}[5m])) /
sum(rate(zai_proxy_requests_total{deployment_variant="production"}[5m]))

Troubleshooting

Metrics not appearing

  1. Check ServiceMonitor exists:

    kubectl get servicemonitor -n monitoring | grep zai-proxy
    
  2. Check Service labels match ServiceMonitor selector:

    kubectl get service -n mcp zai-proxy -o jsonpath='{.metadata.labels}'
    kubectl get service -n devpod zai-proxy-canary -o jsonpath='{.metadata.labels}'
    
  3. Check Prometheus is scraping:

    kubectl get configmap -n monitoring prometheus-kube-prometheus-prometheus-targets -o jsonpath='{.data}'
    

Dashboard not loading

  1. Check ConfigMap exists:

    kubectl get configmap -n monitoring zai-proxy-grafana-dashboard
    
  2. Verify dashboard JSON is valid:

    kubectl get configmap -n monitoring zai-proxy-grafana-dashboard -o jsonpath='{.data.*}' | jq .
    

Alerts not firing

  1. Check PrometheusRule exists:

    kubectl get prometheusrule -n monitoring zai-proxy-canary-alerts
    
  2. Verify rules are loaded:

    kubectl port-forward -n monitoring prometheus-kube-prometheus-prometheus-0 9090:9090
    curl http://localhost:9090/api/v1/rules | jq '.data.groups[] | select(.name=="zai_proxy_canary_alerts")'
    

Summary

The monitoring setup provides:

  • Separate scraping for production and canary deployments
  • Version labels for metric filtering
  • Grafana dashboard with side-by-side comparison
  • Token counting metrics (canary only)
  • Request rates, error rates, latency for both
  • Canary-specific alerts that don't affect production
  • Comparison alerts (canary vs production)

All monitoring resources are managed via GitOps (ArgoCD) by committing manifests to the repository.