zai-proxy/docs/notes/MONITORING_SETUP.md

# ZAI-Proxy Monitoring Setup - Dual Deployment

## Overview

This document describes the monitoring configuration for zai-proxy dual deployment (production + canary). The setup includes ServiceMonitors for Prometheus scraping, PrometheusRules for alerting, and a Grafana dashboard for visualization.

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                    apexalgo-iad Cluster                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌──────────────────┐         ┌──────────────────┐             │
│  │  Production      │         │     Canary       │             │
│  │  (mcp namespace) │         │ (devpod namespace)│             │
│  │                  │         │                  │             │
│  │  zai-proxy:1.0.0 │         │zai-proxy:1.2.0   │             │
│  │  TOKEN_COUNTING  │         │TOKEN_COUNTING    │             │
│  │  = false         │         │= true            │             │
│  └────────┬─────────┘         └────────┬─────────┘             │
│           │                             │                        │
│           │ /metrics                    │ /metrics               │
│           │ variant="production"        │ variant="canary"       │
│           ▼                             ▼                        │
│  ┌────────────────────────────────────────────────────┐        │
│  │         Prometheus (monitoring namespace)           │        │
│  │                                                     │        │
│  │  ServiceMonitor: zai-proxy-production              │        │
│  │    selector: app=zai-proxy, version=production     │        │
│  │    namespace: mcp                                  │        │
│  │    relabels: deployment_variant=production         │        │
│  │                                                     │        │
│  │  ServiceMonitor: zai-proxy-canary                  │        │
│  │    selector: app=zai-proxy-canary, version=canary  │        │
│  │    namespace: devpod                               │        │
│  │    relabels: deployment_variant=canary             │        │
│  │                                                     │        │
│  │  PrometheusRules: zai-proxy-canary-alerts          │        │
│  │    - Canaries-specific alerts                      │        │
│  │    - Comparison alerts vs production               │        │
│  └────────────────────────────────────────────────────┘        │
│           │                             │                        │
│           ▼                             ▼                        │
│  ┌────────────────────────────────────────────────────┐        │
│  │          Grafana Dashboard                         │        │
│  │  zai-proxy-dual-deployment.json                    │        │
│  │                                                     │        │
│  │  Panels:                                            │        │
│  │    - Worker Utilization (gauge)                    │        │
│  │    - Request Rate (timeseries)                     │        │
│  │    - Error Rate (timeseries)                       │        │
│  │    - Latency Comparison (P50/P95)                  │        │
│  │    - Current Rate Limit                            │        │
│  │    - Upstream Errors                               │        │
│  │    - Concurrent Requests (gauge)                   │        │
│  │    - Token Throughput (canary only)                │        │
│  │    - Token Counting Duration (canary only)         │        │
│  │    - Rate Limit Adjustments (canary only)          │        │
│  └────────────────────────────────────────────────────┘        │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘
```

## File Structure

```
k8s/
├── production/
│   ├── deployment.yml       # Production deployment (mcp namespace)
│   └── service.yml          # Production service with version label
├── canary/
│   ├── deployment.yml       # Canary deployment (devpod namespace)
│   └── service.yml          # Canary service with version label
└── monitoring/
    ├── servicemonitor-production.yml    # Prometheus scraping for production
    ├── servicemonitor-canary.yml        # Prometheus scraping for canary
    ├── prometheus-rules.yml             # Canary-specific alerts
    └── grafana-dashboard-configmap.yml  # Grafana dashboard JSON
```

## ServiceMonitor Configuration

### Production ServiceMonitor

**File**: `k8s/monitoring/servicemonitor-production.yml`

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: zai-proxy-production
  namespace: monitoring
  labels:
    app: zai-proxy
    release: kube-prometheus-stack-arde
    variant: production
spec:
  selector:
    matchLabels:
      app: zai-proxy
      version: production
  namespaceSelector:
    matchNames:
    - mcp
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
    relabelings:
    - sourceLabels: [__meta_kubernetes_service_label_version]
      targetLabel: deployment_variant
```

### Canary ServiceMonitor

**File**: `k8s/monitoring/servicemonitor-canary.yml`

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: zai-proxy-canary
  namespace: monitoring
  labels:
    app: zai-proxy
    release: kube-prometheus-stack-arde
    variant: canary
spec:
  selector:
    matchLabels:
      app: zai-proxy-canary
      version: canary
  namespaceSelector:
    matchNames:
    - devpod
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
    relabelings:
    - sourceLabels: [__meta_kubernetes_service_label_version]
      targetLabel: deployment_variant
```

**Key Points**:
- Both ServiceMonitors add `deployment_variant` label via relabeling
- Production scrapes from `mcp` namespace
- Canary scrapes from `devpod` namespace
- Scrape interval: 30 seconds

## Metrics

### Application Metrics (from `metrics.go`)

| Metric Name | Type | Labels | Description |
|------------|------|--------|-------------|
| `zai_proxy_requests_total` | Counter | method, path, status_code, variant | Total requests |
| `zai_proxy_request_duration_seconds` | Histogram | method, path, status_code, variant | Request latency |
| `zai_proxy_concurrent_requests` | Gauge | variant | Active requests |
| `zai_proxy_worker_utilization_ratio` | Gauge | variant | Worker utilization % |
| `zai_proxy_rate_limit_requests_per_second` | Gauge | variant | Current rate limit |
| `zai_proxy_tokens_total` | Counter | direction, model, variant | Token counts |
| `zai_proxy_token_count_duration_seconds` | Histogram | variant | Token counting time |
| `zai_proxy_build_info` | Gauge | version, variant, commit, build_time | Build metadata |

### Label Mapping

| Application Label | ServiceMonitor Relabel | Dashboard Query |
|------------------|----------------------|-----------------|
| `variant="production"` | `deployment_variant="production"` | `deployment_variant="production"` |
| `variant="canary"` | `deployment_variant="canary"` | `deployment_variant="canary"` |

## Grafana Dashboard

**File**: `k8s/monitoring/grafana-dashboard-configmap.yml`

**Dashboard**: "ZAI Proxy - Production vs Canary"

### Panels

1. **Worker Utilization** (Gauge)
   - Shows concurrent requests vs max workers
   - Separate gauges for production and canary

2. **Request Rate** (Time Series)
   - `sum(rate(zai_proxy_requests_total{deployment_variant="production"}[5m]))`
   - `sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[5m]))`

3. **Error Rate** (Time Series)
   - 5xx errors as percentage of total requests
   - Separate lines for production and canary

4. **Latency Comparison** (Time Series)
   - P50 and P95 percentiles
   - Separate lines for each deployment

5. **Current Rate Limit** (Time Series)
   - `zai_proxy_rate_limit_requests_per_second`

6. **Upstream Errors** (Time Series)
   - `zai_proxy_upstream_errors_total` by error_type

7. **Concurrent Requests** (Gauge)
   - Current active requests per deployment

8. **Token Throughput** (Time Series) - Canary Only
   - `zai_proxy_tokens_total` by direction and model
   - Only for canary since production has token counting disabled

9. **Token Counting Duration** (Time Series) - Canary Only
   - P95 of `zai_proxy_token_count_duration_seconds`

10. **Rate Limit Adjustments** (Time Series) - Canary Only
    - `zai_proxy_rate_limit_adjustments_total` by direction

### Dashboard Labels

```json
"tags": ["zai-proxy", "canary", "production", "monitoring"]
```

## Prometheus Rules - Canary Alerts

**File**: `k8s/monitoring/prometheus-rules.yml`

### Alert Rules

| Alert Name | Severity | Condition | Duration |
|-----------|----------|-----------|----------|
| `ZaiProxyCanaryHighErrorRate` | warning | Error rate > 5% | 5min |
| `ZaiProxyCanaryHighLatency` | warning | P95 > 10s | 5min |
| `ZaiProxyCanaryCrashLooping` | critical | Restart rate > 0 | 5min |
| `ZaiProxyCanaryNotReady` | critical | 0 ready pods | 2min |
| `ZaiProxyCanaryDegradedVsProduction` | warning | 2x error rate vs production | 10min |
| `ZaiProxyCanarySlowerThanProduction` | warning | 50% higher P95 vs production | 10min |
| `ZaiProxyCanaryTokenCountingSlow` | warning | Token counting P95 > 100ms | 5min |
| `ZaiProxyCanaryRateLimitAdjustingDown` | info | Rate limit decreasing | 5min |

### Alert Examples

**High Error Rate**:
```promql
(
  sum(rate(zai_proxy_requests_total{deployment_variant="canary",status_code=~"5.."}[5m]))
  /
  sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[5m]))
) > 0.05
```

**Degraded vs Production**:
```promql
(
  sum(rate(zai_proxy_requests_total{deployment_variant="canary",status_code=~"5.."}[10m]))
  /
  sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[10m]))
) > 2 * (
  sum(rate(zai_proxy_requests_total{deployment_variant="production",status_code=~"5.."}[10m]))
  /
  sum(rate(zai_proxy_requests_total{deployment_variant="production"}[10m])) + 0.01
)
```

## Deployment Labels

### Production Deployment

```yaml
# k8s/production/deployment.yml
spec:
  template:
    metadata:
      labels:
        app: zai-proxy
        version: production
    spec:
      containers:
      - name: proxy
        env:
        - name: DEPLOYMENT_VARIANT
          value: "production"
        - name: TOKEN_COUNTING_ENABLED
          value: "false"
```

### Canary Deployment

```yaml
# k8s/canary/deployment.yml
spec:
  template:
    metadata:
      labels:
        app: zai-proxy-canary
        version: canary
    spec:
      containers:
      - name: proxy
        env:
        - name: DEPLOYMENT_VARIANT
          value: "canary"
        - name: TOKEN_COUNTING_ENABLED
          value: "true"
```

## Verification

Run the verification script to check the monitoring setup:

```bash
./scripts/verify-monitoring.sh
```

This will check:
- Monitoring namespace exists
- ServiceMonitors are configured correctly
- PrometheusRules are deployed
- Grafana dashboard exists
- Relabel configs are correct

## Manual Metrics Testing

To verify metrics are being exported correctly:

```bash
# Production metrics endpoint
kubectl port-forward -n mcp deployment/zai-proxy 8080:8080
curl http://localhost:8080/metrics | grep zai_proxy

# Canary metrics endpoint
kubectl port-forward -n devpod deployment/zai-proxy-canary 8080:8080
curl http://localhost:8080/metrics | grep zai_proxy
```

## Prometheus Queries

### Compare request rates

```promql
sum(rate(zai_proxy_requests_total{deployment_variant="production"}[5m]))
sum(rate(zai_proxy_requests_total{deployment_variant="canary"}[5m]))
```

### Check token counting (canary only)

```promql
sum(rate(zai_proxy_tokens_total{deployment_variant="canary"}[5m])) by (direction, model)
```

### Compare error rates

```promql
sum(rate(zai_proxy_requests_total{deployment_variant="production",status_code=~"5.."}[5m])) /
sum(rate(zai_proxy_requests_total{deployment_variant="production"}[5m]))
```

## Troubleshooting

### Metrics not appearing

1. Check ServiceMonitor exists:
   ```bash
   kubectl get servicemonitor -n monitoring | grep zai-proxy
   ```

2. Check Service labels match ServiceMonitor selector:
   ```bash
   kubectl get service -n mcp zai-proxy -o jsonpath='{.metadata.labels}'
   kubectl get service -n devpod zai-proxy-canary -o jsonpath='{.metadata.labels}'
   ```

3. Check Prometheus is scraping:
   ```bash
   kubectl get configmap -n monitoring prometheus-kube-prometheus-prometheus-targets -o jsonpath='{.data}'
   ```

### Dashboard not loading

1. Check ConfigMap exists:
   ```bash
   kubectl get configmap -n monitoring zai-proxy-grafana-dashboard
   ```

2. Verify dashboard JSON is valid:
   ```bash
   kubectl get configmap -n monitoring zai-proxy-grafana-dashboard -o jsonpath='{.data.*}' | jq .
   ```

### Alerts not firing

1. Check PrometheusRule exists:
   ```bash
   kubectl get prometheusrule -n monitoring zai-proxy-canary-alerts
   ```

2. Verify rules are loaded:
   ```bash
   kubectl port-forward -n monitoring prometheus-kube-prometheus-prometheus-0 9090:9090
   curl http://localhost:9090/api/v1/rules | jq '.data.groups[] | select(.name=="zai_proxy_canary_alerts")'
   ```

## Summary

The monitoring setup provides:
- ✅ Separate scraping for production and canary deployments
- ✅ Version labels for metric filtering
- ✅ Grafana dashboard with side-by-side comparison
- ✅ Token counting metrics (canary only)
- ✅ Request rates, error rates, latency for both
- ✅ Canary-specific alerts that don't affect production
- ✅ Comparison alerts (canary vs production)

All monitoring resources are managed via GitOps (ArgoCD) by committing manifests to the repository.