# Token Counting Troubleshooting Guide Quick reference for diagnosing and fixing token counting issues. ## Table of Contents 1. [Quick Diagnostics](#quick-diagnostics) 2. [Common Issues](#common-issues) 3. [Advanced Debugging](#advanced-debugging) --- ## Quick Diagnostics ### 1. Check if Token Counting is Enabled **Command:** ```bash kubectl logs -n mcp deployment/zai-proxy --tail=100 | grep -i "token counting" ``` **Expected Output (enabled):** ``` Token counting enabled (tiktoken cl100k_base encoding, model: glm-4) ``` **Expected Output (disabled):** ``` Token counting disabled (TOKEN_COUNTING_ENABLED=false) ``` **Expected Output (fallback mode):** ``` Warning: Failed to initialize TikToken counter: Falling back to SimpleTokenCounter Token counting enabled (fallback mode, model: glm-4) ``` ### 2. Check Token Metrics Availability **Command:** ```bash curl -s http://zai-proxy.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_tokens_total ``` **Expected Output:** ``` zai_proxy_tokens_total{direction="input",model="glm-4"} 1234 zai_proxy_tokens_total{direction="output",model="glm-4"} 5678 ``` **If no output:** Token counting is disabled or no requests have been processed yet. ### 3. Check Configuration **Command:** ```bash kubectl get deployment zai-proxy -n mcp -o jsonpath='{.spec.template.spec.containers[0].env}' | jq ``` **Look for:** ```json [ { "name": "TOKEN_COUNTING_ENABLED", "value": "true" }, { "name": "TOKENIZER_MODEL", "value": "glm-4" } ] ``` ### 4. Live Token Usage Monitoring **Command:** ```bash kubectl logs -f -n mcp deployment/zai-proxy | grep "Token usage" ``` **Expected Output (per request):** ``` Token usage: input=123, output=456 ``` --- ## Common Issues ### Issue 1: No Token Metrics in Prometheus **Symptoms:** - `zai_proxy_tokens_total` metric is missing - No "Token usage" logs **Diagnosis:** ```bash # Step 1: Check if token counting is enabled kubectl logs -n mcp deployment/zai-proxy --tail=50 | grep "Token counting" # Step 2: Check environment variable kubectl get deployment zai-proxy -n mcp -o yaml | grep -A 2 TOKEN_COUNTING_ENABLED ``` **Solution:** ```bash # Enable token counting kubectl set env deployment/zai-proxy -n mcp TOKEN_COUNTING_ENABLED=true # Restart deployment kubectl rollout restart deployment/zai-proxy -n mcp # Wait for rollout kubectl rollout status deployment/zai-proxy -n mcp # Verify kubectl logs -n mcp deployment/zai-proxy --tail=10 | grep "Token counting" ``` --- ### Issue 2: Inaccurate Token Counts (>10% variance) **Symptoms:** - Token counts significantly differ from expected - Large variance compared to Anthropic's counts **Diagnosis:** ```bash # Check if fallback tokenizer is active kubectl logs -n mcp deployment/zai-proxy --tail=100 | grep -i fallback ``` **If fallback mode detected:** ``` Falling back to SimpleTokenCounter Token counting enabled (fallback mode, model: glm-4) ``` **Root Cause:** Tiktoken initialization failed. Fallback uses word count approximation (~30% variance). **Solution:** **Option 1: Rebuild with tiktoken dependencies** ```bash cd /home/coder/ardenone-cluster/containers/zai-proxy # Check tiktoken dependency go list -m github.com/tiktoken-go/tokenizer # If missing, download go mod download # Rebuild go build -o zai-proxy main.go tokenizer.go # Test locally ./zai-proxy # Should see: "Token counting enabled (tiktoken cl100k_base encoding, model: glm-4)" ``` **Option 2: Rebuild Docker image** ```bash cd /home/coder/ardenone-cluster/containers/zai-proxy # Rebuild image docker build -t zai-proxy:latest . # Push to registry docker tag zai-proxy:latest ghcr.io/ardenone/zai-proxy:latest docker push ghcr.io/ardenone/zai-proxy:latest # Update deployment kubectl rollout restart deployment/zai-proxy -n mcp ``` **Verification:** ```bash kubectl logs -n mcp deployment/zai-proxy --tail=20 | grep "Token counting" # Should see: "Token counting enabled (tiktoken cl100k_base encoding, model: glm-4)" # Should NOT see: "fallback mode" ``` --- ### Issue 3: High Token Counting Latency (>5ms) **Symptoms:** - Token counting is slow - `zai_proxy_token_count_duration_seconds` > 5ms **Diagnosis:** ```promql # Query 99th percentile latency histogram_quantile(0.99, rate(zai_proxy_token_count_duration_seconds_bucket[5m])) ``` **Expected:** <1ms for 99th percentile **Problem threshold:** >5ms **Root Causes:** 1. **High CPU usage** ```bash kubectl top pod -n mcp -l app=zai-proxy ``` **If CPU is throttled (near limits):** ```yaml # Increase CPU limits resources: limits: cpu: 2000m # Increase from 1000m ``` 2. **High concurrent requests (mutex contention)** ```promql # Check concurrent requests zai_proxy_concurrent_requests ``` **If >50 concurrent requests:** ```bash # Increase MAX_WORKERS kubectl set env deployment/zai-proxy -n mcp MAX_WORKERS=100 ``` 3. **Large response bodies** - Token counting processes entire response - Large outputs (>10K tokens) may take longer **Workaround:** Disable token counting for specific endpoints if needed **Verification:** ```promql # Check latency after changes histogram_quantile(0.99, rate(zai_proxy_token_count_duration_seconds_bucket[5m])) ``` --- ### Issue 4: Token Usage Not in Response Body **Expected Behavior:** Token counts are **NOT** included in response bodies (feature not implemented yet). **Current Behavior:** - ✅ Token counts logged: `Token usage: input=X, output=Y` - ✅ Prometheus metrics: `zai_proxy_tokens_total` - ❌ **No** `usage` field in response JSON/SSE **Workaround 1: Use Logs** ```bash # Monitor token usage in real-time kubectl logs -f -n mcp deployment/zai-proxy | grep "Token usage" ``` **Workaround 2: Use Prometheus** ```promql # Total tokens per request (approximation) rate(zai_proxy_tokens_total[5m]) / rate(zai_proxy_requests_total[5m]) ``` **Future Enhancement:** Usage injection planned (see `InjectTokenUsage()` in `tokenizer.go`) --- ### Issue 5: Metrics Not Reset After Deployment **Symptoms:** - Token counts seem to persist across pod restarts - Metrics show large cumulative numbers **Explanation:** Prometheus metrics are cumulative counters (lifetime totals). **Expected Behavior:** - Counters reset to 0 when pod restarts - Use `rate()` or `increase()` for meaningful queries **Incorrect Query:** ```promql # DON'T: Raw counter (lifetime total since pod start) zai_proxy_tokens_total ``` **Correct Queries:** ```promql # Tokens per minute (rolling 5m window) rate(zai_proxy_tokens_total[5m]) * 60 # Total tokens in last hour increase(zai_proxy_tokens_total[1h]) # Tokens per request rate(zai_proxy_tokens_total[5m]) / rate(zai_proxy_requests_total[5m]) ``` --- ## Advanced Debugging ### Enable Debug Logging **Modify deployment:** ```yaml env: - name: LOG_LEVEL value: "debug" # Enable debug logs ``` **Apply:** ```bash kubectl apply -f deployment.yaml kubectl rollout status deployment/zai-proxy -n mcp ``` **Watch logs:** ```bash kubectl logs -f -n mcp deployment/zai-proxy ``` ### Inspect Request/Response Bodies **Add temporary debug logging in code:** ```go // In main.go, after CountRequestTokens() log.Printf("DEBUG: Request body: %s", string(requestBody)) log.Printf("DEBUG: Input tokens: %d", inputTokens) // After CountOutputTokens() log.Printf("DEBUG: Response body: %s", string(bodyCapture.GetCapturedContent())) log.Printf("DEBUG: Output tokens: %d", outputTokens) ``` **Rebuild and deploy:** ```bash go build -o zai-proxy main.go tokenizer.go docker build -t zai-proxy:debug . ``` ⚠️ **Warning:** Only use in development. Large logs will impact performance. ### Test Token Counting Locally **Create test request:** ```bash cat > test_request.json < test_tiktoken.go < startup_logs.txt # Recent request logs kubectl logs -n mcp deployment/zai-proxy --tail=500 | grep "Token usage" > token_logs.txt # Configuration kubectl get deployment zai-proxy -n mcp -o yaml > deployment.yaml # Metrics snapshot curl -s http://zai-proxy.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_token > metrics.txt ``` 2. **Check documentation:** - [TOKENIZATION.md](./TOKENIZATION.md) - Comprehensive guide - [ENVIRONMENT_VARIABLES.md](./ENVIRONMENT_VARIABLES.md) - Configuration reference 3. **File an issue:** - Include diagnostic files above - Describe expected vs actual behavior - Include relevant logs and metrics --- ## Quick Reference Commands **Enable token counting:** ```bash kubectl set env deployment/zai-proxy -n mcp TOKEN_COUNTING_ENABLED=true kubectl rollout restart deployment/zai-proxy -n mcp ``` **Disable token counting:** ```bash kubectl set env deployment/zai-proxy -n mcp TOKEN_COUNTING_ENABLED=false kubectl rollout restart deployment/zai-proxy -n mcp ``` **Watch token usage logs:** ```bash kubectl logs -f -n mcp deployment/zai-proxy | grep "Token usage" ``` **Query token metrics:** ```bash curl -s http://zai-proxy.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_tokens ``` **Check latency:** ```promql histogram_quantile(0.99, rate(zai_proxy_token_count_duration_seconds_bucket[5m])) ``` **Restart proxy:** ```bash kubectl rollout restart deployment/zai-proxy -n mcp kubectl rollout status deployment/zai-proxy -n mcp ```