Extracted from ardenone-cluster/containers/zai-proxy and ardenone-cluster/containers/zai-proxy-dashboard. - proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0) - Token counting, rate limiting, Prometheus metrics, canary support - dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0) - Prometheus collector, SQLite storage, SSE live updates - docs/: Operational notes, research, and plan subdirs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.6 KiB
4.6 KiB
Token Metrics Quick Reference
Quick Access
# View metrics
curl http://zai-proxy:8080/metrics | grep zai_proxy_tokens
# Kubernetes cluster
kubectl port-forward -n mcp svc/zai-proxy 8080:8080
curl http://localhost:8080/metrics | grep zai_proxy_tokens
Helper Functions Reference
Recording Token Counts
// Record input tokens
RecordInputTokens(model, version, count)
// Example: RecordInputTokens("glm-4", "stable", 150)
// Record output tokens
RecordOutputTokens(model, version, count)
// Example: RecordOutputTokens("glm-4", "stable", 420)
Recording Token Rates
// Record token processing rate (both time and throughput)
RecordTokenRate(direction, model, version, duration, tokenCount)
// Example: RecordTokenRate("input", "glm-4", "stable", 5*time.Millisecond, 100)
// Convenience wrappers
RecordInputTokenRate(model, version, duration, tokenCount)
RecordOutputTokenRate(model, version, duration, tokenCount)
Essential Prometheus Queries
Token Consumption
# Total tokens per second (all types)
sum(rate(zai_proxy_tokens_total{variant="stable"}[5m]))
# Input vs output
sum(rate(zai_proxy_tokens_total{variant="stable"}[5m])) by (direction)
# By model
sum(rate(zai_proxy_tokens_total{variant="stable"}[5m])) by (model)
# Total in last hour
sum(increase(zai_proxy_tokens_total{variant="stable"}[1h]))
Performance
# P95 tokenization latency
histogram_quantile(0.95,
rate(zai_proxy_token_rate_seconds_bucket{variant="stable"}[5m]))
# Average tokens per second
rate(zai_proxy_token_rate_sum{variant="stable"}[5m]) /
rate(zai_proxy_token_rate_count{variant="stable"}[5m])
Canary Comparison
# Compare token rates
sum(rate(zai_proxy_tokens_total[5m])) by (variant, direction)
# Compare performance
histogram_quantile(0.95,
rate(zai_proxy_token_rate_seconds_bucket[5m])) by (variant)
Metrics Summary
| Metric | Type | Labels | Purpose |
|---|---|---|---|
zai_proxy_tokens_total |
Counter | direction, model, variant | Total tokens processed |
zai_proxy_token_rate_seconds |
Histogram | direction, model, variant | Tokenization latency |
zai_proxy_token_rate |
Histogram | direction, model, variant | Tokens per second |
Labels
- direction:
input(prompts) oroutput(completions) - model:
glm-4,claude-3, etc. (set viaTOKENIZER_MODELenv var) - variant:
stable(production) orcanary(testing) (set viaDEPLOYMENT_VARIANTenv var)
Common Alerts
# Slow tokenization
- alert: SlowTokenization
expr: |
histogram_quantile(0.95,
rate(zai_proxy_token_rate_seconds_bucket[5m])) > 0.01
for: 10m
annotations:
summary: "P95 tokenization latency above 10ms"
# High token rate anomaly
- alert: TokenRateAnomaly
expr: |
abs(rate(zai_proxy_tokens_total[5m]) -
rate(zai_proxy_tokens_total[5m] offset 1h)) > 100
for: 15m
annotations:
summary: "Token processing rate changed significantly"
Grafana Quick Panels
Token Rate (Time Series)
sum(rate(zai_proxy_tokens_total{variant="stable"}[5m])) by (direction)
Cost Estimate (Stat)
# $0.01/1k input, $0.03/1k output
(sum(increase(zai_proxy_tokens_total{direction="input",variant="stable"}[24h])) * 0.00001) +
(sum(increase(zai_proxy_tokens_total{direction="output",variant="stable"}[24h])) * 0.00003)
Latency Heatmap
sum(rate(zai_proxy_token_rate_seconds_bucket{variant="stable"}[5m])) by (le)
Environment Variables
# Enable token counting (default: true)
TOKEN_COUNTING_ENABLED=true
# Model name for metrics (default: glm-4)
TOKENIZER_MODEL=glm-4
# Deployment variant (default: production)
DEPLOYMENT_VARIANT=stable # or "canary"
Troubleshooting
No metrics appearing
- Check token counting is enabled:
TOKEN_COUNTING_ENABLED=true - Verify tokenizer initialized: Check logs for "Token counting enabled"
- Check metrics endpoint:
curl http://zai-proxy:8080/metrics | grep tokens_total
Metrics not incrementing
- Verify requests are being processed: Check
zai_proxy_requests_total - Check for tokenization errors in logs
- Verify token counts > 0: Only non-zero counts are recorded
Wrong labels
- Check
TOKENIZER_MODELenvironment variable - Check
DEPLOYMENT_VARIANTenvironment variable - Verify labels in Prometheus:
zai_proxy_tokens_total{}
Full Documentation
See /home/coder/ardenone-cluster/containers/zai-proxy/docs/metrics.md for:
- Complete metric descriptions
- All Prometheus query examples
- Grafana dashboard templates
- Alerting rule suggestions
- Integration examples