Extracted from ardenone-cluster/containers/zai-proxy and ardenone-cluster/containers/zai-proxy-dashboard. - proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0) - Token counting, rate limiting, Prometheus metrics, canary support - dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0) - Prometheus collector, SQLite storage, SSE live updates - docs/: Operational notes, research, and plan subdirs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
12 KiB
Token Counting Troubleshooting Guide
Quick reference for diagnosing and fixing token counting issues.
Table of Contents
Quick Diagnostics
1. Check if Token Counting is Enabled
Command:
kubectl logs -n mcp deployment/zai-proxy --tail=100 | grep -i "token counting"
Expected Output (enabled):
Token counting enabled (tiktoken cl100k_base encoding, model: glm-4)
Expected Output (disabled):
Token counting disabled (TOKEN_COUNTING_ENABLED=false)
Expected Output (fallback mode):
Warning: Failed to initialize TikToken counter: <error>
Falling back to SimpleTokenCounter
Token counting enabled (fallback mode, model: glm-4)
2. Check Token Metrics Availability
Command:
curl -s http://zai-proxy.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_tokens_total
Expected Output:
zai_proxy_tokens_total{direction="input",model="glm-4"} 1234
zai_proxy_tokens_total{direction="output",model="glm-4"} 5678
If no output: Token counting is disabled or no requests have been processed yet.
3. Check Configuration
Command:
kubectl get deployment zai-proxy -n mcp -o jsonpath='{.spec.template.spec.containers[0].env}' | jq
Look for:
[
{
"name": "TOKEN_COUNTING_ENABLED",
"value": "true"
},
{
"name": "TOKENIZER_MODEL",
"value": "glm-4"
}
]
4. Live Token Usage Monitoring
Command:
kubectl logs -f -n mcp deployment/zai-proxy | grep "Token usage"
Expected Output (per request):
Token usage: input=123, output=456
Common Issues
Issue 1: No Token Metrics in Prometheus
Symptoms:
zai_proxy_tokens_totalmetric is missing- No "Token usage" logs
Diagnosis:
# Step 1: Check if token counting is enabled
kubectl logs -n mcp deployment/zai-proxy --tail=50 | grep "Token counting"
# Step 2: Check environment variable
kubectl get deployment zai-proxy -n mcp -o yaml | grep -A 2 TOKEN_COUNTING_ENABLED
Solution:
# Enable token counting
kubectl set env deployment/zai-proxy -n mcp TOKEN_COUNTING_ENABLED=true
# Restart deployment
kubectl rollout restart deployment/zai-proxy -n mcp
# Wait for rollout
kubectl rollout status deployment/zai-proxy -n mcp
# Verify
kubectl logs -n mcp deployment/zai-proxy --tail=10 | grep "Token counting"
Issue 2: Inaccurate Token Counts (>10% variance)
Symptoms:
- Token counts significantly differ from expected
- Large variance compared to Anthropic's counts
Diagnosis:
# Check if fallback tokenizer is active
kubectl logs -n mcp deployment/zai-proxy --tail=100 | grep -i fallback
If fallback mode detected:
Falling back to SimpleTokenCounter
Token counting enabled (fallback mode, model: glm-4)
Root Cause: Tiktoken initialization failed. Fallback uses word count approximation (~30% variance).
Solution:
Option 1: Rebuild with tiktoken dependencies
cd /home/coder/ardenone-cluster/containers/zai-proxy
# Check tiktoken dependency
go list -m github.com/tiktoken-go/tokenizer
# If missing, download
go mod download
# Rebuild
go build -o zai-proxy main.go tokenizer.go
# Test locally
./zai-proxy
# Should see: "Token counting enabled (tiktoken cl100k_base encoding, model: glm-4)"
Option 2: Rebuild Docker image
cd /home/coder/ardenone-cluster/containers/zai-proxy
# Rebuild image
docker build -t zai-proxy:latest .
# Push to registry
docker tag zai-proxy:latest ghcr.io/ardenone/zai-proxy:latest
docker push ghcr.io/ardenone/zai-proxy:latest
# Update deployment
kubectl rollout restart deployment/zai-proxy -n mcp
Verification:
kubectl logs -n mcp deployment/zai-proxy --tail=20 | grep "Token counting"
# Should see: "Token counting enabled (tiktoken cl100k_base encoding, model: glm-4)"
# Should NOT see: "fallback mode"
Issue 3: High Token Counting Latency (>5ms)
Symptoms:
- Token counting is slow
zai_proxy_token_count_duration_seconds> 5ms
Diagnosis:
# Query 99th percentile latency
histogram_quantile(0.99, rate(zai_proxy_token_count_duration_seconds_bucket[5m]))
Expected: <1ms for 99th percentile Problem threshold: >5ms
Root Causes:
-
High CPU usage
kubectl top pod -n mcp -l app=zai-proxyIf CPU is throttled (near limits):
# Increase CPU limits resources: limits: cpu: 2000m # Increase from 1000m -
High concurrent requests (mutex contention)
# Check concurrent requests zai_proxy_concurrent_requestsIf >50 concurrent requests:
# Increase MAX_WORKERS kubectl set env deployment/zai-proxy -n mcp MAX_WORKERS=100 -
Large response bodies
- Token counting processes entire response
- Large outputs (>10K tokens) may take longer
Workaround: Disable token counting for specific endpoints if needed
Verification:
# Check latency after changes
histogram_quantile(0.99, rate(zai_proxy_token_count_duration_seconds_bucket[5m]))
Issue 4: Token Usage Not in Response Body
Expected Behavior: Token counts are NOT included in response bodies (feature not implemented yet).
Current Behavior:
- ✅ Token counts logged:
Token usage: input=X, output=Y - ✅ Prometheus metrics:
zai_proxy_tokens_total - ❌ No
usagefield in response JSON/SSE
Workaround 1: Use Logs
# Monitor token usage in real-time
kubectl logs -f -n mcp deployment/zai-proxy | grep "Token usage"
Workaround 2: Use Prometheus
# Total tokens per request (approximation)
rate(zai_proxy_tokens_total[5m]) / rate(zai_proxy_requests_total[5m])
Future Enhancement: Usage injection planned (see InjectTokenUsage() in tokenizer.go)
Issue 5: Metrics Not Reset After Deployment
Symptoms:
- Token counts seem to persist across pod restarts
- Metrics show large cumulative numbers
Explanation: Prometheus metrics are cumulative counters (lifetime totals).
Expected Behavior:
- Counters reset to 0 when pod restarts
- Use
rate()orincrease()for meaningful queries
Incorrect Query:
# DON'T: Raw counter (lifetime total since pod start)
zai_proxy_tokens_total
Correct Queries:
# Tokens per minute (rolling 5m window)
rate(zai_proxy_tokens_total[5m]) * 60
# Total tokens in last hour
increase(zai_proxy_tokens_total[1h])
# Tokens per request
rate(zai_proxy_tokens_total[5m]) / rate(zai_proxy_requests_total[5m])
Advanced Debugging
Enable Debug Logging
Modify deployment:
env:
- name: LOG_LEVEL
value: "debug" # Enable debug logs
Apply:
kubectl apply -f deployment.yaml
kubectl rollout status deployment/zai-proxy -n mcp
Watch logs:
kubectl logs -f -n mcp deployment/zai-proxy
Inspect Request/Response Bodies
Add temporary debug logging in code:
// In main.go, after CountRequestTokens()
log.Printf("DEBUG: Request body: %s", string(requestBody))
log.Printf("DEBUG: Input tokens: %d", inputTokens)
// After CountOutputTokens()
log.Printf("DEBUG: Response body: %s", string(bodyCapture.GetCapturedContent()))
log.Printf("DEBUG: Output tokens: %d", outputTokens)
Rebuild and deploy:
go build -o zai-proxy main.go tokenizer.go
docker build -t zai-proxy:debug .
⚠️ Warning: Only use in development. Large logs will impact performance.
Test Token Counting Locally
Create test request:
cat > test_request.json <<EOF
{
"model": "claude-3-sonnet",
"messages": [
{"role": "user", "content": "What is 2+2?"}
],
"max_tokens": 50
}
EOF
Run proxy locally:
export ZAI_API_KEY="your-api-key"
export TOKEN_COUNTING_ENABLED=true
go run main.go tokenizer.go
Send test request:
curl -X POST http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ZAI_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d @test_request.json
Check logs:
Token counting enabled (tiktoken cl100k_base encoding, model: glm-4)
Z.AI proxy listening on :8080
Token usage: input=8, output=5
Verify Tiktoken Installation
Check dependencies:
cd /home/coder/ardenone-cluster/containers/zai-proxy
# List tiktoken dependency
go list -m github.com/tiktoken-go/tokenizer
# Expected output:
# github.com/tiktoken-go/tokenizer v0.2.0
Test tiktoken directly:
cat > test_tiktoken.go <<EOF
package main
import (
"fmt"
"github.com/tiktoken-go/tokenizer"
)
func main() {
enc, err := tokenizer.Get(tokenizer.Cl100kBase)
if err != nil {
fmt.Printf("Error: %v\n", err)
return
}
text := "Hello, world!"
ids, _, _ := enc.Encode(text)
fmt.Printf("Text: %s\n", text)
fmt.Printf("Tokens: %d\n", len(ids))
}
EOF
go run test_tiktoken.go
# Expected output:
# Text: Hello, world!
# Tokens: 4
Compare Token Counts with Anthropic
Send same request to both proxies:
# Request via zai-proxy
curl -X POST http://zai-proxy:8080/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ZAI_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{"model":"claude-3-sonnet","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}' \
-o zai_response.json
# Request directly to Anthropic (if you have API key)
curl -X POST https://api.anthropic.com/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{"model":"claude-3-sonnet-20240229","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}' \
-o anthropic_response.json
# Compare usage fields
cat anthropic_response.json | jq '.usage'
# zai-proxy logs: Token usage: input=X, output=Y
Expected variance: <3% for Claude models, <10% for GLM-4
Diagnostic Checklist
Use this checklist when troubleshooting token counting issues:
- Token counting enabled in environment variables
- Startup logs show "tiktoken cl100k_base encoding" (not "fallback mode")
- Prometheus metrics
zai_proxy_tokens_totalare present - "Token usage" logs appear for each request
- Token counting latency <1ms (99th percentile)
- Token counts within expected variance (<10%)
- CPU usage not throttled
- Sufficient MAX_WORKERS for concurrent requests
Getting Help
If you've tried all troubleshooting steps and still have issues:
-
Collect diagnostic information:
# Startup logs kubectl logs -n mcp deployment/zai-proxy --tail=100 > startup_logs.txt # Recent request logs kubectl logs -n mcp deployment/zai-proxy --tail=500 | grep "Token usage" > token_logs.txt # Configuration kubectl get deployment zai-proxy -n mcp -o yaml > deployment.yaml # Metrics snapshot curl -s http://zai-proxy.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_token > metrics.txt -
Check documentation:
- TOKENIZATION.md - Comprehensive guide
- ENVIRONMENT_VARIABLES.md - Configuration reference
-
File an issue:
- Include diagnostic files above
- Describe expected vs actual behavior
- Include relevant logs and metrics
Quick Reference Commands
Enable token counting:
kubectl set env deployment/zai-proxy -n mcp TOKEN_COUNTING_ENABLED=true
kubectl rollout restart deployment/zai-proxy -n mcp
Disable token counting:
kubectl set env deployment/zai-proxy -n mcp TOKEN_COUNTING_ENABLED=false
kubectl rollout restart deployment/zai-proxy -n mcp
Watch token usage logs:
kubectl logs -f -n mcp deployment/zai-proxy | grep "Token usage"
Query token metrics:
curl -s http://zai-proxy.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_tokens
Check latency:
histogram_quantile(0.99, rate(zai_proxy_token_count_duration_seconds_bucket[5m]))
Restart proxy:
kubectl rollout restart deployment/zai-proxy -n mcp
kubectl rollout status deployment/zai-proxy -n mcp