jedarden e7c24a0c08 feat: initial zai-proxy ecosystem repo

Extracted from ardenone-cluster/containers/zai-proxy and
ardenone-cluster/containers/zai-proxy-dashboard.

- proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0)
  - Token counting, rate limiting, Prometheus metrics, canary support
- dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0)
  - Prometheus collector, SQLite storage, SSE live updates
- docs/: Operational notes, research, and plan subdirs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-16 15:53:52 -04:00

9.8 KiB

Raw Permalink Blame History

zai-proxy Token Consumption - Grafana Integration

Date: 2026-02-14 Status: ✅ Dashboards Updated - Pending Verification

Summary

Token consumption metrics are already being collected by the zai-proxy but were not visualized in Grafana. This document tracks the integration of token panels into the Grafana dashboards.

✅ Step 1: Grafana Dashboard Updates (COMPLETED)

Changes Made

Added 7 new panels to both Grafana dashboards:

cluster-configuration/apexalgo-iad/monitoring/grafana-dashboard-zai-proxy.yml
cluster-configuration/ardenone-cluster/monitoring/grafana-dashboard-zai-proxy.yml

New Panels

Row 1: Token Consumption Stats (y=62, h=4)

Total Tokens (1h) - Stat panel showing cumulative tokens in last hour
- Query: sum(increase(zai_proxy_tokens_total[1h]))
- Thresholds: Green < 50k < Yellow < 100k < Orange
Input Tokens (1h) - Stat panel for input tokens only
- Query: sum(increase(zai_proxy_tokens_total{direction="input"}[1h]))
- Thresholds: Green < 25k < Yellow < 50k < Orange
Output Tokens (1h) - Stat panel for output tokens only
- Query: sum(increase(zai_proxy_tokens_total{direction="output"}[1h]))
- Thresholds: Green < 25k < Yellow < 50k < Orange
Output/Input Token Ratio - Stat panel showing efficiency metric
- Query: sum(rate(zai_proxy_tokens_total{direction="output"}[5m])) / sum(rate(zai_proxy_tokens_total{direction="input"}[5m]))
- Shows how many output tokens are generated per input token

Row 2: Token Rate Time Series (y=66, h=8)

Token Rate by Direction - Time series comparing input vs output
- Queries:
  - Input: sum(rate(zai_proxy_tokens_total{direction="input"}[5m]))
  - Output: sum(rate(zai_proxy_tokens_total{direction="output"}[5m]))
  - Total: sum(rate(zai_proxy_tokens_total[5m]))
- Shows tokens/sec over time
Token Rate by Deployment Variant - Time series by stable/canary
- Query: sum(rate(zai_proxy_tokens_total[5m])) by (variant)
- Useful for A/B testing deployments

Row 3: Token Throughput Performance (y=74, h=8)

Token Processing Throughput (p90/p99) - Performance metrics
- Queries:
  - p90: histogram_quantile(0.90, sum(rate(zai_proxy_token_rate_bucket[5m])) by (le, direction))
  - p99: histogram_quantile(0.99, sum(rate(zai_proxy_token_rate_bucket[5m])) by (le, direction))
- Shows tokenization performance (tokens/sec at percentiles)

⏳ Step 2: ServiceMonitor Verification (BLOCKED BY HOOK)

What Needs to be Checked

The ServiceMonitor configuration exists in the repo but verification was blocked by pre-tool hooks.

ServiceMonitor Files:

cluster-configuration/apexalgo-iad/mcp/zai-proxy-servicemonitor.yml
cluster-configuration/ardenone-cluster/devpod/zai-proxy-servicemonitor.yml

ServiceMonitor Configuration:

endpoints:
  - port: http
    interval: 15s
    path: /metrics
    scrapeTimeout: 10s

Manual Verification Steps

Run these commands to verify ServiceMonitor is deployed and scraping:

# Check ServiceMonitor exists in apexalgo-iad
export KUBECONFIG=/home/coder/.kube/apexalgo-iad.kubeconfig
kubectl get servicemonitor -n mcp zai-proxy

# Check ServiceMonitor exists in ardenone-cluster
kubectl get servicemonitor -n devpod zai-proxy

# Verify Prometheus is scraping the targets
# Access Prometheus UI and check:
# - Status > Targets > Look for "zai-proxy" jobs
# - Should see endpoints with "UP" status

⏳ Step 3: Prometheus Metrics Verification (BLOCKED BY HOOK)

What Needs to be Checked

Verify that token metrics are being scraped by Prometheus.

Manual Verification Steps

Option 1: Direct Metrics Endpoint Check

# Port-forward to zai-proxy pod
kubectl port-forward -n devpod svc/zai-proxy 8080:8080

# In another terminal, check metrics
curl -s http://localhost:8080/metrics | grep "zai_proxy_tokens"

# Expected output (example):
# zai_proxy_tokens_total{direction="input",model="glm-4",variant="production"} 12345
# zai_proxy_tokens_total{direction="output",model="glm-4",variant="production"} 67890
# zai_proxy_token_count_duration_seconds_bucket{variant="production",le="0.001"} 100
# zai_proxy_token_rate_bucket{direction="input",model="glm-4",variant="production",le="1000"} 50

Option 2: Prometheus UI Query

# Access Prometheus UI (port-forward if needed)
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090

# Open browser: http://localhost:9090
# Run these queries in the UI:

# 1. Check if metric exists
zai_proxy_tokens_total

# 2. Total tokens in last hour
sum(increase(zai_proxy_tokens_total[1h]))

# 3. Token rate by direction
sum(rate(zai_proxy_tokens_total[5m])) by (direction)

# 4. Input vs Output ratio
sum(rate(zai_proxy_tokens_total{direction="output"}[5m]))
/
sum(rate(zai_proxy_tokens_total{direction="input"}[5m]))

Option 3: PromQL via kubectl

# Query Prometheus via API
POD=$(kubectl get pod -n monitoring -l app.kubernetes.io/name=prometheus -o jsonpath='{.items[0].metadata.name}')

kubectl exec -n monitoring $POD -- wget -qO- \
  'http://localhost:9090/api/v1/query?query=sum(increase(zai_proxy_tokens_total[1h]))'

Expected Metrics

The zai-proxy exposes these token-related metrics:

Primary Metrics

Metric Name	Type	Labels	Description
`zai_proxy_tokens_total`	Counter	`direction`, `model`, `variant`	Total tokens processed
`zai_proxy_token_count_duration_seconds`	Histogram	`variant`	Duration of token counting
`zai_proxy_token_rate_seconds`	Histogram	`direction`, `model`, `variant`	Time per token
`zai_proxy_token_rate`	Histogram	`direction`, `model`, `variant`	Tokens per second throughput

Label Values

direction: input, output
model: glm-4 (default), others if configured
variant: production, canary, stable, test

Deployment Status

apexalgo-iad (Production)

Pod Status: ✅ Running (1/1) - zai-proxy-95fc547d7-gjn7q
ServiceMonitor: ⏳ Not confirmed (verification blocked)
Dashboard: ✅ Updated with token panels

ardenone-cluster (Dev/Local)

Pod Status: ⚠️ Mixed (1 running, 3 failing - image pull issues)
ServiceMonitor: ⏳ Not confirmed (verification blocked)
Dashboard: ✅ Updated with token panels

Next Steps

Immediate Actions

✅ DONE: Add token panels to Grafana dashboards
⏳ TODO: Verify ServiceMonitor is deployed and scraping
⏳ TODO: Confirm Prometheus is collecting token metrics
⏳ TODO: Access Grafana UI to view the new token panels

Follow-up Actions

Deploy ServiceMonitor (if not already deployed):

# For apexalgo-iad
kubectl apply -f cluster-configuration/apexalgo-iad/mcp/zai-proxy-servicemonitor.yml

# For ardenone-cluster
kubectl apply -f cluster-configuration/ardenone-cluster/devpod/zai-proxy-servicemonitor.yml

Fix image pull errors in ardenone-cluster:
- Check failing pods: kubectl describe pod -n devpod zai-proxy-64f66d59d6-7f7g7
- Verify image exists and is accessible

Commit and push dashboard changes:

cd /home/coder/ardenone-cluster
git add cluster-configuration/*/monitoring/grafana-dashboard-zai-proxy.yml
git commit -m "feat(zai-proxy): add token consumption panels to Grafana dashboard

- Add 7 new panels for token metrics visualization
- Total tokens (1h), Input/Output breakdown
- Token rate time series by direction and variant
- Token processing throughput (p90/p99)
- Metrics already collected, now visualized

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"
git push origin main

Wait for ArgoCD sync:
- ArgoCD will detect the ConfigMap changes
- Grafana will reload the dashboard automatically
- Check sync status: kubectl get application -n argocd

Troubleshooting

Dashboard doesn't show token data

Symptom: Panels show "No data"

Diagnosis:

Check if ServiceMonitor is deployed:

kubectl get servicemonitor -n devpod zai-proxy

Check Prometheus targets:
- Open Prometheus UI
- Status > Targets
- Look for zai-proxy job
- Should be "UP" status

Verify metrics endpoint:

kubectl exec -n devpod deploy/zai-proxy -- wget -qO- http://localhost:8080/metrics | grep tokens

Solutions:

If ServiceMonitor missing: Apply it manually
If endpoint not exposing metrics: Check proxy version (should be v1.1.0+)
If Prometheus not scraping: Check ServiceMonitor labels match Prometheus scrape config

Queries return no results

Symptom: PromQL queries return empty result

Diagnosis:

# Check if metric name exists in Prometheus
# (access Prometheus UI and search for "zai_proxy_tokens")

Solutions:

Metric might be new - wait 15-30 seconds for first scrape
Check scrape interval: 15s default
Verify time range in Grafana (use "Last 5 minutes" for testing)

References

Metrics Documentation: ardenone-cluster/docs/zai-proxy-metrics.md
Proxy Source: ardenone-cluster/containers/zai-proxy/
Metrics Implementation: ardenone-cluster/containers/zai-proxy/metrics.go
Token Tracking Code: ardenone-cluster/containers/zai-proxy/main.go:323,496,521

Verification Checklist

Once hooks are resolved or manual verification is performed:

ServiceMonitor deployed in mcp namespace (apexalgo-iad)
ServiceMonitor deployed in devpod namespace (ardenone-cluster)
Prometheus targets show zai-proxy as "UP"
/metrics endpoint exposes zai_proxy_tokens_total
Grafana dashboard shows token consumption data
All 7 new panels render correctly
Token metrics update in real-time (15s interval)
Commit pushed to main branch
ArgoCD synced the dashboard changes

9.8 KiB Raw Permalink Blame History

zai-proxy Token Consumption - Grafana Integration

Summary

✅ Step 1: Grafana Dashboard Updates (COMPLETED)

Changes Made

New Panels

Row 1: Token Consumption Stats (y=62, h=4)

Row 2: Token Rate Time Series (y=66, h=8)

Row 3: Token Throughput Performance (y=74, h=8)

⏳ Step 2: ServiceMonitor Verification (BLOCKED BY HOOK)

What Needs to be Checked

Manual Verification Steps

⏳ Step 3: Prometheus Metrics Verification (BLOCKED BY HOOK)

What Needs to be Checked

Manual Verification Steps

Option 1: Direct Metrics Endpoint Check

Option 2: Prometheus UI Query

Option 3: PromQL via kubectl

Expected Metrics

Primary Metrics

Label Values

Deployment Status

apexalgo-iad (Production)

ardenone-cluster (Dev/Local)

Next Steps

Immediate Actions

Follow-up Actions

Troubleshooting

Dashboard doesn't show token data

Queries return no results

References

Verification Checklist

9.8 KiB

Raw Permalink Blame History