Extracted from ardenone-cluster/containers/zai-proxy and ardenone-cluster/containers/zai-proxy-dashboard. - proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0) - Token counting, rate limiting, Prometheus metrics, canary support - dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0) - Prometheus collector, SQLite storage, SSE live updates - docs/: Operational notes, research, and plan subdirs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
610 lines
17 KiB
Markdown
610 lines
17 KiB
Markdown
# ZAI Proxy Prometheus Metrics Documentation
|
|
|
|
## Overview
|
|
|
|
The zai-proxy exports comprehensive metrics for monitoring token consumption, request performance, rate limiting, and system health. All metrics are exposed on the `/metrics` endpoint in Prometheus text format.
|
|
|
|
## Metrics Endpoint
|
|
|
|
```bash
|
|
# Access metrics
|
|
curl http://zai-proxy:8080/metrics
|
|
|
|
# Query from within Kubernetes cluster
|
|
curl http://zai-proxy.mcp.svc.cluster.local:8080/metrics
|
|
```
|
|
|
|
## Token Consumption Metrics
|
|
|
|
### `zai_proxy_tokens_total`
|
|
|
|
**Type:** Counter
|
|
|
|
**Description:** Total number of tokens processed, tracking both input (prompt) and output (completion) tokens separately.
|
|
|
|
**Labels:**
|
|
- `direction` - Token direction: `input` (prompt tokens) or `output` (completion tokens)
|
|
- `model` - Tokenizer model name (e.g., `glm-4`, `claude-3`)
|
|
- `variant` - Deployment variant: `stable` (production) or `canary` (testing)
|
|
|
|
**Example values:**
|
|
```prometheus
|
|
zai_proxy_tokens_total{direction="input",model="glm-4",variant="stable"} 1250000
|
|
zai_proxy_tokens_total{direction="output",model="glm-4",variant="stable"} 3500000
|
|
zai_proxy_tokens_total{direction="input",model="glm-4",variant="canary"} 15000
|
|
zai_proxy_tokens_total{direction="output",model="glm-4",variant="canary"} 42000
|
|
```
|
|
|
|
**Use cases:**
|
|
- Track total token consumption over time
|
|
- Calculate cost based on token usage
|
|
- Compare input vs output token ratios
|
|
- Monitor canary deployment token usage separately from production
|
|
|
|
### `zai_proxy_token_rate_seconds`
|
|
|
|
**Type:** Histogram
|
|
|
|
**Description:** Time taken to process tokens (tokenization speed). Lower values indicate faster tokenization. Measures the duration of the tokenization operation itself.
|
|
|
|
**Labels:**
|
|
- `direction` - Token direction: `input` or `output`
|
|
- `model` - Tokenizer model name
|
|
- `variant` - Deployment variant: `stable` or `canary`
|
|
|
|
**Buckets:** `[.00001, .00005, .0001, .0005, .001, .005, .01, .05, .1]` (seconds)
|
|
|
|
**Example values:**
|
|
```prometheus
|
|
zai_proxy_token_rate_seconds_bucket{direction="input",model="glm-4",variant="stable",le="0.001"} 9500
|
|
zai_proxy_token_rate_seconds_bucket{direction="input",model="glm-4",variant="stable",le="0.005"} 9980
|
|
zai_proxy_token_rate_seconds_bucket{direction="input",model="glm-4",variant="stable",le="+Inf"} 10000
|
|
zai_proxy_token_rate_seconds_sum{direction="input",model="glm-4",variant="stable"} 8.234
|
|
zai_proxy_token_rate_seconds_count{direction="input",model="glm-4",variant="stable"} 10000
|
|
```
|
|
|
|
**Use cases:**
|
|
- Monitor tokenization performance
|
|
- Detect tokenizer slowdowns
|
|
- Compare performance between models
|
|
- Alert on slow tokenization (>10ms P95)
|
|
|
|
### `zai_proxy_token_rate`
|
|
|
|
**Type:** Histogram
|
|
|
|
**Description:** Token processing throughput in tokens per second. Higher values indicate faster processing. Measures how many tokens are processed per unit time.
|
|
|
|
**Labels:**
|
|
- `direction` - Token direction: `input` or `output`
|
|
- `model` - Tokenizer model name
|
|
- `variant` - Deployment variant: `stable` or `canary`
|
|
|
|
**Buckets:** `[10, 50, 100, 250, 500, 1000, 2500, 5000, 10000, 25000, 50000, 100000]` (tokens/second)
|
|
|
|
**Example values:**
|
|
```prometheus
|
|
zai_proxy_token_rate_bucket{direction="input",model="glm-4",variant="stable",le="1000"} 120
|
|
zai_proxy_token_rate_bucket{direction="input",model="glm-4",variant="stable",le="5000"} 850
|
|
zai_proxy_token_rate_bucket{direction="input",model="glm-4",variant="stable",le="+Inf"} 1000
|
|
zai_proxy_token_rate_sum{direction="input",model="glm-4",variant="stable"} 2500000
|
|
zai_proxy_token_rate_count{direction="input",model="glm-4",variant="stable"} 1000
|
|
```
|
|
|
|
**Use cases:**
|
|
- Monitor tokenization throughput
|
|
- Compare throughput between input and output tokenization
|
|
- Identify performance bottlenecks
|
|
- Capacity planning based on tokens/second
|
|
|
|
### `zai_proxy_token_count_duration_seconds`
|
|
|
|
**Type:** Histogram
|
|
|
|
**Description:** Overall duration of token counting operations, including both tokenization and any overhead.
|
|
|
|
**Labels:**
|
|
- `variant` - Deployment variant: `stable` or `canary`
|
|
|
|
**Buckets:** `[.0001, .0005, .001, .005, .01, .025, .05, .1]` (seconds)
|
|
|
|
**Use cases:**
|
|
- Monitor total token counting overhead
|
|
- Ensure token counting latency stays below target (<5ms P95)
|
|
- Compare performance between stable and canary deployments
|
|
|
|
## Request Performance Metrics
|
|
|
|
### `zai_proxy_requests_total`
|
|
|
|
**Type:** Counter
|
|
|
|
**Description:** Total number of requests processed.
|
|
|
|
**Labels:**
|
|
- `method` - HTTP method (GET, POST, etc.)
|
|
- `path` - Request path
|
|
- `status_code` - HTTP status code
|
|
- `variant` - Deployment variant
|
|
|
|
**Example query:**
|
|
```promql
|
|
# Request rate by status code
|
|
rate(zai_proxy_requests_total{variant="stable"}[5m])
|
|
|
|
# Error rate
|
|
rate(zai_proxy_requests_total{variant="stable",status_code=~"5.."}[5m])
|
|
```
|
|
|
|
### `zai_proxy_request_duration_seconds`
|
|
|
|
**Type:** Histogram
|
|
|
|
**Description:** Request duration from start to completion.
|
|
|
|
**Labels:**
|
|
- `method`, `path`, `status_code`, `variant`
|
|
|
|
**Buckets:** `[.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10, 30, 60, 120, 300]` (seconds)
|
|
|
|
**Example query:**
|
|
```promql
|
|
# P95 latency
|
|
histogram_quantile(0.95, rate(zai_proxy_request_duration_seconds_bucket{variant="stable"}[5m]))
|
|
|
|
# Average latency
|
|
rate(zai_proxy_request_duration_seconds_sum{variant="stable"}[5m]) /
|
|
rate(zai_proxy_request_duration_seconds_count{variant="stable"}[5m])
|
|
```
|
|
|
|
### `zai_proxy_request_size_bytes` / `zai_proxy_response_size_bytes`
|
|
|
|
**Type:** Histogram
|
|
|
|
**Description:** Request and response body sizes in bytes.
|
|
|
|
**Labels:**
|
|
- Request: `method`, `path`, `variant`
|
|
- Response: `method`, `path`, `status_code`, `variant`
|
|
|
|
**Buckets:** Exponential (100, 1000, 10000, ...)
|
|
|
|
**Example query:**
|
|
```promql
|
|
# Average response size
|
|
rate(zai_proxy_response_size_bytes_sum{variant="stable"}[5m]) /
|
|
rate(zai_proxy_response_size_bytes_count{variant="stable"}[5m])
|
|
```
|
|
|
|
## Concurrency & Worker Metrics
|
|
|
|
### `zai_proxy_concurrent_requests`
|
|
|
|
**Type:** Gauge
|
|
|
|
**Description:** Number of requests currently being processed.
|
|
|
|
**Labels:**
|
|
- `variant` - Deployment variant
|
|
|
|
**Example query:**
|
|
```promql
|
|
# Current load
|
|
zai_proxy_concurrent_requests{variant="stable"}
|
|
```
|
|
|
|
### `zai_proxy_max_workers`
|
|
|
|
**Type:** Gauge
|
|
|
|
**Description:** Maximum number of concurrent workers allowed (configured limit).
|
|
|
|
**Labels:**
|
|
- `variant` - Deployment variant
|
|
|
|
### `zai_proxy_worker_utilization_ratio`
|
|
|
|
**Type:** Gauge
|
|
|
|
**Description:** Worker utilization ratio (concurrent_requests / max_workers). Value ranges from 0.0 to 1.0 (or higher if overloaded).
|
|
|
|
**Labels:**
|
|
- `variant` - Deployment variant
|
|
|
|
**Example query:**
|
|
```promql
|
|
# Worker utilization percentage
|
|
zai_proxy_worker_utilization_ratio{variant="stable"} * 100
|
|
|
|
# Alert when utilization exceeds 80%
|
|
zai_proxy_worker_utilization_ratio{variant="stable"} > 0.8
|
|
```
|
|
|
|
## Rate Limiting Metrics
|
|
|
|
### `zai_proxy_rate_limit_requests_per_second`
|
|
|
|
**Type:** Gauge
|
|
|
|
**Description:** Current rate limit in requests per second. This value adjusts automatically based on upstream 429 responses.
|
|
|
|
**Labels:**
|
|
- `variant` - Deployment variant
|
|
|
|
### `zai_proxy_rate_limit_wait_seconds`
|
|
|
|
**Type:** Histogram
|
|
|
|
**Description:** Time spent waiting for rate limiter before processing request.
|
|
|
|
**Labels:**
|
|
- `variant` - Deployment variant
|
|
|
|
**Buckets:** `[.001, .005, .01, .025, .05, .1, .25, .5, 1, 2, 5, 10]` (seconds)
|
|
|
|
### `zai_proxy_rate_limit_adjustments_total`
|
|
|
|
**Type:** Counter
|
|
|
|
**Description:** Number of times the rate limit was adjusted (increased or decreased).
|
|
|
|
**Labels:**
|
|
- `direction` - Adjustment direction: `increase` or `decrease`
|
|
- `variant` - Deployment variant
|
|
|
|
**Example query:**
|
|
```promql
|
|
# Rate limit adjustments over time
|
|
rate(zai_proxy_rate_limit_adjustments_total{variant="stable"}[10m])
|
|
```
|
|
|
|
### `zai_proxy_rate_limit_rejections_total`
|
|
|
|
**Type:** Counter
|
|
|
|
**Description:** Number of requests rejected due to rate limiting.
|
|
|
|
**Labels:**
|
|
- `variant` - Deployment variant
|
|
|
|
## Error Metrics
|
|
|
|
### `zai_proxy_upstream_errors_total`
|
|
|
|
**Type:** Counter
|
|
|
|
**Description:** Total number of upstream errors by error type.
|
|
|
|
**Labels:**
|
|
- `error_type` - Error type: `request_creation`, `upstream_connection`, `read_error`, `write_error`
|
|
- `variant` - Deployment variant
|
|
|
|
**Example query:**
|
|
```promql
|
|
# Error rate by type
|
|
rate(zai_proxy_upstream_errors_total{variant="stable"}[5m])
|
|
```
|
|
|
|
### `zai_proxy_retry_attempts_total`
|
|
|
|
**Type:** Counter
|
|
|
|
**Description:** Total number of retry attempts.
|
|
|
|
**Labels:**
|
|
- `reason` - Retry reason: `429` (rate limited), `network_error`, or `retry` (general)
|
|
- `variant` - Deployment variant
|
|
|
|
## Build Info Metric
|
|
|
|
### `zai_proxy_build_info`
|
|
|
|
**Type:** Gauge (always 1)
|
|
|
|
**Description:** Build information including version, variant, commit hash, and build time. This metric always has value 1 and exists solely to export build metadata as labels.
|
|
|
|
**Labels:**
|
|
- `version` - Version number (e.g., `v1.3.0`)
|
|
- `variant` - Deployment variant: `stable` or `canary`
|
|
- `commit` - Git commit hash
|
|
- `build_time` - Build timestamp
|
|
|
|
**Example query:**
|
|
```promql
|
|
# View current deployed version
|
|
zai_proxy_build_info{variant="stable"}
|
|
```
|
|
|
|
## Example Prometheus Queries
|
|
|
|
### Token Consumption Analysis
|
|
|
|
```promql
|
|
# Total tokens processed per hour (input + output)
|
|
sum(increase(zai_proxy_tokens_total{variant="stable"}[1h]))
|
|
|
|
# Input vs output token ratio
|
|
sum(rate(zai_proxy_tokens_total{direction="input",variant="stable"}[5m])) /
|
|
sum(rate(zai_proxy_tokens_total{direction="output",variant="stable"}[5m]))
|
|
|
|
# Token usage by model
|
|
sum by (model) (rate(zai_proxy_tokens_total{variant="stable"}[5m]))
|
|
|
|
# Compare stable vs canary token usage
|
|
sum(rate(zai_proxy_tokens_total{variant="stable"}[5m])) by (direction)
|
|
sum(rate(zai_proxy_tokens_total{variant="canary"}[5m])) by (direction)
|
|
```
|
|
|
|
### Tokenization Performance
|
|
|
|
```promql
|
|
# P95 tokenization latency (input)
|
|
histogram_quantile(0.95,
|
|
rate(zai_proxy_token_rate_seconds_bucket{direction="input",variant="stable"}[5m]))
|
|
|
|
# Average tokenization throughput (tokens/second)
|
|
rate(zai_proxy_token_rate_sum{direction="input",variant="stable"}[5m]) /
|
|
rate(zai_proxy_token_rate_count{direction="input",variant="stable"}[5m])
|
|
|
|
# Slow tokenization alert (P95 > 10ms)
|
|
histogram_quantile(0.95,
|
|
rate(zai_proxy_token_rate_seconds_bucket{variant="stable"}[5m])) > 0.01
|
|
```
|
|
|
|
### Request Performance
|
|
|
|
```promql
|
|
# Requests per second
|
|
sum(rate(zai_proxy_requests_total{variant="stable"}[5m]))
|
|
|
|
# P95 request latency
|
|
histogram_quantile(0.95,
|
|
rate(zai_proxy_request_duration_seconds_bucket{variant="stable"}[5m]))
|
|
|
|
# Error rate (5xx responses)
|
|
sum(rate(zai_proxy_requests_total{variant="stable",status_code=~"5.."}[5m])) /
|
|
sum(rate(zai_proxy_requests_total{variant="stable"}[5m]))
|
|
```
|
|
|
|
### Canary vs Production Comparison
|
|
|
|
```promql
|
|
# Token processing rate comparison
|
|
sum(rate(zai_proxy_tokens_total{variant="stable"}[5m])) by (direction)
|
|
sum(rate(zai_proxy_tokens_total{variant="canary"}[5m])) by (direction)
|
|
|
|
# Latency comparison (P95)
|
|
histogram_quantile(0.95,
|
|
rate(zai_proxy_request_duration_seconds_bucket{variant="stable"}[5m]))
|
|
histogram_quantile(0.95,
|
|
rate(zai_proxy_request_duration_seconds_bucket{variant="canary"}[5m]))
|
|
|
|
# Error rate comparison
|
|
sum(rate(zai_proxy_requests_total{variant="stable",status_code=~"5.."}[5m])) /
|
|
sum(rate(zai_proxy_requests_total{variant="stable"}[5m]))
|
|
sum(rate(zai_proxy_requests_total{variant="canary",status_code=~"5.."}[5m])) /
|
|
sum(rate(zai_proxy_requests_total{variant="canary"}[5m]))
|
|
```
|
|
|
|
### Capacity Planning
|
|
|
|
```promql
|
|
# Worker utilization trend
|
|
zai_proxy_worker_utilization_ratio{variant="stable"}
|
|
|
|
# Concurrent requests vs max workers
|
|
zai_proxy_concurrent_requests{variant="stable"}
|
|
zai_proxy_max_workers{variant="stable"}
|
|
|
|
# Rate limiting pressure
|
|
rate(zai_proxy_rate_limit_adjustments_total{direction="decrease",variant="stable"}[10m])
|
|
```
|
|
|
|
## Grafana Dashboard Suggestions
|
|
|
|
### Dashboard 1: Token Consumption Overview
|
|
|
|
**Panels:**
|
|
|
|
1. **Total Tokens Processed (Time Series)**
|
|
```promql
|
|
sum(rate(zai_proxy_tokens_total{variant="stable"}[5m])) by (direction)
|
|
```
|
|
|
|
2. **Token Rate by Model (Time Series)**
|
|
```promql
|
|
sum(rate(zai_proxy_tokens_total{variant="stable"}[5m])) by (model, direction)
|
|
```
|
|
|
|
3. **Token Cost Estimate (Stat Panel)**
|
|
```promql
|
|
# Assuming $0.01 per 1000 input tokens, $0.03 per 1000 output tokens
|
|
(sum(increase(zai_proxy_tokens_total{direction="input",variant="stable"}[24h])) * 0.00001) +
|
|
(sum(increase(zai_proxy_tokens_total{direction="output",variant="stable"}[24h])) * 0.00003)
|
|
```
|
|
|
|
4. **Tokenization Latency (Heatmap)**
|
|
```promql
|
|
sum(rate(zai_proxy_token_rate_seconds_bucket{variant="stable"}[5m])) by (le)
|
|
```
|
|
|
|
5. **Input vs Output Token Ratio (Gauge)**
|
|
```promql
|
|
sum(rate(zai_proxy_tokens_total{direction="input",variant="stable"}[5m])) /
|
|
sum(rate(zai_proxy_tokens_total{direction="output",variant="stable"}[5m]))
|
|
```
|
|
|
|
### Dashboard 2: Performance & Health
|
|
|
|
**Panels:**
|
|
|
|
1. **Request Rate (Time Series)**
|
|
```promql
|
|
sum(rate(zai_proxy_requests_total{variant="stable"}[5m])) by (status_code)
|
|
```
|
|
|
|
2. **Request Latency Percentiles (Time Series)**
|
|
```promql
|
|
histogram_quantile(0.50, rate(zai_proxy_request_duration_seconds_bucket{variant="stable"}[5m]))
|
|
histogram_quantile(0.95, rate(zai_proxy_request_duration_seconds_bucket{variant="stable"}[5m]))
|
|
histogram_quantile(0.99, rate(zai_proxy_request_duration_seconds_bucket{variant="stable"}[5m]))
|
|
```
|
|
|
|
3. **Error Rate (Time Series)**
|
|
```promql
|
|
sum(rate(zai_proxy_requests_total{variant="stable",status_code=~"5.."}[5m])) /
|
|
sum(rate(zai_proxy_requests_total{variant="stable"}[5m]))
|
|
```
|
|
|
|
4. **Worker Utilization (Gauge)**
|
|
```promql
|
|
zai_proxy_worker_utilization_ratio{variant="stable"} * 100
|
|
```
|
|
|
|
5. **Concurrent Requests (Time Series)**
|
|
```promql
|
|
zai_proxy_concurrent_requests{variant="stable"}
|
|
zai_proxy_max_workers{variant="stable"}
|
|
```
|
|
|
|
6. **Upstream Errors (Time Series)**
|
|
```promql
|
|
sum(rate(zai_proxy_upstream_errors_total{variant="stable"}[5m])) by (error_type)
|
|
```
|
|
|
|
### Dashboard 3: Canary Deployment Comparison
|
|
|
|
**Panels:**
|
|
|
|
1. **Token Usage: Stable vs Canary (Time Series)**
|
|
```promql
|
|
sum(rate(zai_proxy_tokens_total[5m])) by (variant, direction)
|
|
```
|
|
|
|
2. **Latency Comparison (Time Series)**
|
|
```promql
|
|
histogram_quantile(0.95, rate(zai_proxy_request_duration_seconds_bucket[5m])) by (variant)
|
|
```
|
|
|
|
3. **Error Rate Comparison (Time Series)**
|
|
```promql
|
|
sum(rate(zai_proxy_requests_total{status_code=~"5.."}[5m])) by (variant) /
|
|
sum(rate(zai_proxy_requests_total[5m])) by (variant)
|
|
```
|
|
|
|
4. **Tokenization Performance Comparison (Time Series)**
|
|
```promql
|
|
histogram_quantile(0.95, rate(zai_proxy_token_rate_seconds_bucket[5m])) by (variant)
|
|
```
|
|
|
|
5. **Request Rate Comparison (Time Series)**
|
|
```promql
|
|
sum(rate(zai_proxy_requests_total[5m])) by (variant)
|
|
```
|
|
|
|
### Dashboard 4: Rate Limiting & Capacity
|
|
|
|
**Panels:**
|
|
|
|
1. **Current Rate Limit (Gauge)**
|
|
```promql
|
|
zai_proxy_rate_limit_requests_per_second{variant="stable"}
|
|
```
|
|
|
|
2. **Rate Limit Adjustments (Time Series)**
|
|
```promql
|
|
rate(zai_proxy_rate_limit_adjustments_total{variant="stable"}[5m]) by (direction)
|
|
```
|
|
|
|
3. **Rate Limit Wait Time (Heatmap)**
|
|
```promql
|
|
sum(rate(zai_proxy_rate_limit_wait_seconds_bucket{variant="stable"}[5m])) by (le)
|
|
```
|
|
|
|
4. **Retry Attempts (Time Series)**
|
|
```promql
|
|
rate(zai_proxy_retry_attempts_total{variant="stable"}[5m]) by (reason)
|
|
```
|
|
|
|
## Alerting Rules
|
|
|
|
### Critical Alerts
|
|
|
|
```yaml
|
|
# High error rate
|
|
- alert: HighErrorRate
|
|
expr: |
|
|
sum(rate(zai_proxy_requests_total{status_code=~"5.."}[5m])) /
|
|
sum(rate(zai_proxy_requests_total[5m])) > 0.05
|
|
for: 5m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "High error rate detected (>5%)"
|
|
|
|
# Worker capacity exhausted
|
|
- alert: WorkerCapacityExhausted
|
|
expr: zai_proxy_worker_utilization_ratio > 0.9
|
|
for: 5m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Worker utilization above 90%"
|
|
|
|
# Slow tokenization
|
|
- alert: SlowTokenization
|
|
expr: |
|
|
histogram_quantile(0.95,
|
|
rate(zai_proxy_token_rate_seconds_bucket[5m])) > 0.01
|
|
for: 10m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "P95 tokenization latency above 10ms"
|
|
```
|
|
|
|
### Warning Alerts
|
|
|
|
```yaml
|
|
# Frequent rate limit adjustments
|
|
- alert: FrequentRateLimitAdjustments
|
|
expr: |
|
|
rate(zai_proxy_rate_limit_adjustments_total{direction="decrease"}[10m]) > 0.1
|
|
for: 10m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Frequent rate limit decreases detected"
|
|
|
|
# High retry rate
|
|
- alert: HighRetryRate
|
|
expr: rate(zai_proxy_retry_attempts_total[5m]) > 1
|
|
for: 10m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High retry attempt rate"
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Token counting metrics can be configured via environment variables:
|
|
|
|
```bash
|
|
# Enable/disable token counting (default: true)
|
|
TOKEN_COUNTING_ENABLED=true
|
|
|
|
# Tokenizer model name for metrics labels (default: glm-4)
|
|
TOKENIZER_MODEL=glm-4
|
|
|
|
# Deployment variant (default: production)
|
|
DEPLOYMENT_VARIANT=stable # or "canary"
|
|
```
|
|
|
|
## Notes
|
|
|
|
- All histograms use carefully tuned bucket ranges for optimal query performance
|
|
- Metrics are designed to support dual-deployment monitoring (stable + canary)
|
|
- Token metrics track both count and processing rate for comprehensive analysis
|
|
- Labels allow filtering by deployment variant to isolate canary testing from production
|
|
- Build info metric enables version tracking across deployments
|