jedarden e7c24a0c08 feat: initial zai-proxy ecosystem repo

Extracted from ardenone-cluster/containers/zai-proxy and
ardenone-cluster/containers/zai-proxy-dashboard.

- proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0)
  - Token counting, rate limiting, Prometheus metrics, canary support
- dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0)
  - Prometheus collector, SQLite storage, SSE live updates
- docs/: Operational notes, research, and plan subdirs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-16 15:53:52 -04:00

15 KiB

Raw Blame History

ZAI-Proxy Monitoring Setup - Dual Deployment

Overview

This document describes the monitoring configuration for zai-proxy dual deployment (production + canary). The setup includes ServiceMonitors for Prometheus scraping, PrometheusRules for alerting, and a Grafana dashboard for visualization.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    apexalgo-iad Cluster                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌──────────────────┐         ┌──────────────────┐             │
│  │  Production      │         │     Canary       │             │
│  │  (mcp namespace) │         │ (devpod namespace)│             │
│  │                  │         │                  │             │
│  │  zai-proxy:1.0.0 │         │zai-proxy:1.2.0   │             │
│  │  TOKEN_COUNTING  │         │TOKEN_COUNTING    │             │
│  │  = false         │         │= true            │             │
│  └────────┬─────────┘         └────────┬─────────┘             │
│           │                             │                        │
│           │ /metrics                    │ /metrics               │
│           │ variant="production"        │ variant="canary"       │
│           ▼                             ▼                        │
│  ┌────────────────────────────────────────────────────┐        │
│  │         Prometheus (monitoring namespace)           │        │
│  │                                                     │        │
│  │  ServiceMonitor: zai-proxy-production              │        │
│  │    selector: app=zai-proxy, version=production     │        │
│  │    namespace: mcp                                  │        │
│  │    relabels: deployment_variant=production         │        │
│  │                                                     │        │
│  │  ServiceMonitor: zai-proxy-canary                  │        │
│  │    selector: app=zai-proxy-canary, version=canary  │        │
│  │    namespace: devpod                               │        │
│  │    relabels: deployment_variant=canary             │        │
│  │                                                     │        │
│  │  PrometheusRules: zai-proxy-canary-alerts          │        │
│  │    - Canaries-specific alerts                      │        │
│  │    - Comparison alerts vs production               │        │
│  └────────────────────────────────────────────────────┘        │
│           │                             │                        │
│           ▼                             ▼                        │
│  ┌────────────────────────────────────────────────────┐        │
│  │          Grafana Dashboard                         │        │
│  │  zai-proxy-dual-deployment.json                    │        │
│  │                                                     │        │
│  │  Panels:                                            │        │
│  │    - Worker Utilization (gauge)                    │        │
│  │    - Request Rate (timeseries)                     │        │
│  │    - Error Rate (timeseries)                       │        │
│  │    - Latency Comparison (P50/P95)                  │        │
│  │    - Current Rate Limit                            │        │
│  │    - Upstream Errors                               │        │
│  │    - Concurrent Requests (gauge)                   │        │
│  │    - Token Throughput (canary only)                │        │
│  │    - Token Counting Duration (canary only)         │        │
│  │    - Rate Limit Adjustments (canary only)          │        │
│  └────────────────────────────────────────────────────┘        │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘

File Structure

k8s/
├── production/
│   ├── deployment.yml       # Production deployment (mcp namespace)
│   └── service.yml          # Production service with version label
├── canary/
│   ├── deployment.yml       # Canary deployment (devpod namespace)
│   └── service.yml          # Canary service with version label
└── monitoring/
    ├── servicemonitor-production.yml    # Prometheus scraping for production
    ├── servicemonitor-canary.yml        # Prometheus scraping for canary
    ├── prometheus-rules.yml             # Canary-specific alerts
    └── grafana-dashboard-configmap.yml  # Grafana dashboard JSON

ServiceMonitor Configuration

Production ServiceMonitor

File: k8s/monitoring/servicemonitor-production.yml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: zai-proxy-production
  namespace: monitoring
  labels:
    app: zai-proxy
    release: kube-prometheus-stack-arde
    variant: production
spec:
  selector:
    matchLabels:
      app: zai-proxy
      version: production
  namespaceSelector:
    matchNames:
    - mcp
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
    relabelings:
    - sourceLabels: [__meta_kubernetes_service_label_version]
      targetLabel: deployment_variant

Canary ServiceMonitor

File: k8s/monitoring/servicemonitor-canary.yml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: zai-proxy-canary
  namespace: monitoring
  labels:
    app: zai-proxy
    release: kube-prometheus-stack-arde
    variant: canary
spec:
  selector:
    matchLabels:
      app: zai-proxy-canary
      version: canary
  namespaceSelector:
    matchNames:
    - devpod
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
    relabelings:
    - sourceLabels: [__meta_kubernetes_service_label_version]
      targetLabel: deployment_variant

Key Points:

Both ServiceMonitors add deployment_variant label via relabeling
Production scrapes from mcp namespace
Canary scrapes from devpod namespace
Scrape interval: 30 seconds

Metrics

Application Metrics (from `metrics.go`)

Metric Name	Type	Labels	Description
`zai_proxy_requests_total`	Counter	method, path, status_code, variant	Total requests
`zai_proxy_request_duration_seconds`	Histogram	method, path, status_code, variant	Request latency
`zai_proxy_concurrent_requests`	Gauge	variant	Active requests
`zai_proxy_worker_utilization_ratio`	Gauge	variant	Worker utilization %
`zai_proxy_rate_limit_requests_per_second`	Gauge	variant	Current rate limit
`zai_proxy_tokens_total`	Counter	direction, model, variant	Token counts
`zai_proxy_token_count_duration_seconds`	Histogram	variant	Token counting time
`zai_proxy_build_info`	Gauge	version, variant, commit, build_time	Build metadata

Label Mapping

Application Label	ServiceMonitor Relabel	Dashboard Query
`variant="production"`	`deployment_variant="production"`	`deployment_variant="production"`
`variant="canary"`	`deployment_variant="canary"`	`deployment_variant="canary"`

Grafana Dashboard

File: k8s/monitoring/grafana-dashboard-configmap.yml

Dashboard: "ZAI Proxy - Production vs Canary"

Panels

Worker Utilization (Gauge)
- Shows concurrent requests vs max workers
- Separate gauges for production and canary
Request Rate (Time Series)
- sum(rate(zai_proxy_requests_total{deployment_variant="production"}[5m]))
- sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[5m]))
Error Rate (Time Series)
- 5xx errors as percentage of total requests
- Separate lines for production and canary
Latency Comparison (Time Series)
- P50 and P95 percentiles
- Separate lines for each deployment
Current Rate Limit (Time Series)
- zai_proxy_rate_limit_requests_per_second
Upstream Errors (Time Series)
- zai_proxy_upstream_errors_total by error_type
Concurrent Requests (Gauge)
- Current active requests per deployment
Token Throughput (Time Series) - Canary Only
- zai_proxy_tokens_total by direction and model
- Only for canary since production has token counting disabled
Token Counting Duration (Time Series) - Canary Only
- P95 of zai_proxy_token_count_duration_seconds
Rate Limit Adjustments (Time Series) - Canary Only
- zai_proxy_rate_limit_adjustments_total by direction

Dashboard Labels

"tags": ["zai-proxy", "canary", "production", "monitoring"]

Prometheus Rules - Canary Alerts

File: k8s/monitoring/prometheus-rules.yml

Alert Rules

Alert Name	Severity	Condition	Duration
`ZaiProxyCanaryHighErrorRate`	warning	Error rate > 5%	5min
`ZaiProxyCanaryHighLatency`	warning	P95 > 10s	5min
`ZaiProxyCanaryCrashLooping`	critical	Restart rate > 0	5min
`ZaiProxyCanaryNotReady`	critical	0 ready pods	2min
`ZaiProxyCanaryDegradedVsProduction`	warning	2x error rate vs production	10min
`ZaiProxyCanarySlowerThanProduction`	warning	50% higher P95 vs production	10min
`ZaiProxyCanaryTokenCountingSlow`	warning	Token counting P95 > 100ms	5min
`ZaiProxyCanaryRateLimitAdjustingDown`	info	Rate limit decreasing	5min

Alert Examples

High Error Rate:

(
  sum(rate(zai_proxy_requests_total{deployment_variant="canary",status_code=~"5.."}[5m]))
  /
  sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[5m]))
) > 0.05

Degraded vs Production:

(
  sum(rate(zai_proxy_requests_total{deployment_variant="canary",status_code=~"5.."}[10m]))
  /
  sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[10m]))
) > 2 * (
  sum(rate(zai_proxy_requests_total{deployment_variant="production",status_code=~"5.."}[10m]))
  /
  sum(rate(zai_proxy_requests_total{deployment_variant="production"}[10m])) + 0.01
)

Deployment Labels

Production Deployment

# k8s/production/deployment.yml
spec:
  template:
    metadata:
      labels:
        app: zai-proxy
        version: production
    spec:
      containers:
      - name: proxy
        env:
        - name: DEPLOYMENT_VARIANT
          value: "production"
        - name: TOKEN_COUNTING_ENABLED
          value: "false"

Canary Deployment

# k8s/canary/deployment.yml
spec:
  template:
    metadata:
      labels:
        app: zai-proxy-canary
        version: canary
    spec:
      containers:
      - name: proxy
        env:
        - name: DEPLOYMENT_VARIANT
          value: "canary"
        - name: TOKEN_COUNTING_ENABLED
          value: "true"

Verification

Run the verification script to check the monitoring setup:

./scripts/verify-monitoring.sh

This will check:

Monitoring namespace exists
ServiceMonitors are configured correctly
PrometheusRules are deployed
Grafana dashboard exists
Relabel configs are correct

Manual Metrics Testing

To verify metrics are being exported correctly:

# Production metrics endpoint
kubectl port-forward -n mcp deployment/zai-proxy 8080:8080
curl http://localhost:8080/metrics | grep zai_proxy

# Canary metrics endpoint
kubectl port-forward -n devpod deployment/zai-proxy-canary 8080:8080
curl http://localhost:8080/metrics | grep zai_proxy

Prometheus Queries

Compare request rates

sum(rate(zai_proxy_requests_total{deployment_variant="production"}[5m]))
sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[5m]))

Check token counting (canary only)

sum(rate(zai_proxy_tokens_total{deployment_variant="canary"}[5m])) by (direction, model)

Compare error rates

sum(rate(zai_proxy_requests_total{deployment_variant="production",status_code=~"5.."}[5m])) /
sum(rate(zai_proxy_requests_total{deployment_variant="production"}[5m]))

Troubleshooting

Metrics not appearing

Check ServiceMonitor exists:

kubectl get servicemonitor -n monitoring | grep zai-proxy

Check Service labels match ServiceMonitor selector:

kubectl get service -n mcp zai-proxy -o jsonpath='{.metadata.labels}'
kubectl get service -n devpod zai-proxy-canary -o jsonpath='{.metadata.labels}'

Check Prometheus is scraping:

kubectl get configmap -n monitoring prometheus-kube-prometheus-prometheus-targets -o jsonpath='{.data}'

Dashboard not loading

Check ConfigMap exists:

kubectl get configmap -n monitoring zai-proxy-grafana-dashboard

Verify dashboard JSON is valid:

kubectl get configmap -n monitoring zai-proxy-grafana-dashboard -o jsonpath='{.data.*}' | jq .

Alerts not firing

Check PrometheusRule exists:

kubectl get prometheusrule -n monitoring zai-proxy-canary-alerts

Verify rules are loaded:

kubectl port-forward -n monitoring prometheus-kube-prometheus-prometheus-0 9090:9090
curl http://localhost:9090/api/v1/rules | jq '.data.groups[] | select(.name=="zai_proxy_canary_alerts")'

Summary

The monitoring setup provides:

✅ Separate scraping for production and canary deployments
✅ Version labels for metric filtering
✅ Grafana dashboard with side-by-side comparison
✅ Token counting metrics (canary only)
✅ Request rates, error rates, latency for both
✅ Canary-specific alerts that don't affect production
✅ Comparison alerts (canary vs production)

All monitoring resources are managed via GitOps (ArgoCD) by committing manifests to the repository.

15 KiB Raw Blame History