zai-proxy/docs/notes/TROUBLESHOOTING.md
jedarden e7c24a0c08 feat: initial zai-proxy ecosystem repo
Extracted from ardenone-cluster/containers/zai-proxy and
ardenone-cluster/containers/zai-proxy-dashboard.

- proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0)
  - Token counting, rate limiting, Prometheus metrics, canary support
- dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0)
  - Prometheus collector, SQLite storage, SSE live updates
- docs/: Operational notes, research, and plan subdirs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:53:52 -04:00

12 KiB

Token Counting Troubleshooting Guide

Quick reference for diagnosing and fixing token counting issues.

Table of Contents

  1. Quick Diagnostics
  2. Common Issues
  3. Advanced Debugging

Quick Diagnostics

1. Check if Token Counting is Enabled

Command:

kubectl logs -n mcp deployment/zai-proxy --tail=100 | grep -i "token counting"

Expected Output (enabled):

Token counting enabled (tiktoken cl100k_base encoding, model: glm-4)

Expected Output (disabled):

Token counting disabled (TOKEN_COUNTING_ENABLED=false)

Expected Output (fallback mode):

Warning: Failed to initialize TikToken counter: <error>
Falling back to SimpleTokenCounter
Token counting enabled (fallback mode, model: glm-4)

2. Check Token Metrics Availability

Command:

curl -s http://zai-proxy.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_tokens_total

Expected Output:

zai_proxy_tokens_total{direction="input",model="glm-4"} 1234
zai_proxy_tokens_total{direction="output",model="glm-4"} 5678

If no output: Token counting is disabled or no requests have been processed yet.

3. Check Configuration

Command:

kubectl get deployment zai-proxy -n mcp -o jsonpath='{.spec.template.spec.containers[0].env}' | jq

Look for:

[
  {
    "name": "TOKEN_COUNTING_ENABLED",
    "value": "true"
  },
  {
    "name": "TOKENIZER_MODEL",
    "value": "glm-4"
  }
]

4. Live Token Usage Monitoring

Command:

kubectl logs -f -n mcp deployment/zai-proxy | grep "Token usage"

Expected Output (per request):

Token usage: input=123, output=456

Common Issues

Issue 1: No Token Metrics in Prometheus

Symptoms:

  • zai_proxy_tokens_total metric is missing
  • No "Token usage" logs

Diagnosis:

# Step 1: Check if token counting is enabled
kubectl logs -n mcp deployment/zai-proxy --tail=50 | grep "Token counting"

# Step 2: Check environment variable
kubectl get deployment zai-proxy -n mcp -o yaml | grep -A 2 TOKEN_COUNTING_ENABLED

Solution:

# Enable token counting
kubectl set env deployment/zai-proxy -n mcp TOKEN_COUNTING_ENABLED=true

# Restart deployment
kubectl rollout restart deployment/zai-proxy -n mcp

# Wait for rollout
kubectl rollout status deployment/zai-proxy -n mcp

# Verify
kubectl logs -n mcp deployment/zai-proxy --tail=10 | grep "Token counting"

Issue 2: Inaccurate Token Counts (>10% variance)

Symptoms:

  • Token counts significantly differ from expected
  • Large variance compared to Anthropic's counts

Diagnosis:

# Check if fallback tokenizer is active
kubectl logs -n mcp deployment/zai-proxy --tail=100 | grep -i fallback

If fallback mode detected:

Falling back to SimpleTokenCounter
Token counting enabled (fallback mode, model: glm-4)

Root Cause: Tiktoken initialization failed. Fallback uses word count approximation (~30% variance).

Solution:

Option 1: Rebuild with tiktoken dependencies

cd /home/coder/ardenone-cluster/containers/zai-proxy

# Check tiktoken dependency
go list -m github.com/tiktoken-go/tokenizer

# If missing, download
go mod download

# Rebuild
go build -o zai-proxy main.go tokenizer.go

# Test locally
./zai-proxy
# Should see: "Token counting enabled (tiktoken cl100k_base encoding, model: glm-4)"

Option 2: Rebuild Docker image

cd /home/coder/ardenone-cluster/containers/zai-proxy

# Rebuild image
docker build -t zai-proxy:latest .

# Push to registry
docker tag zai-proxy:latest ghcr.io/ardenone/zai-proxy:latest
docker push ghcr.io/ardenone/zai-proxy:latest

# Update deployment
kubectl rollout restart deployment/zai-proxy -n mcp

Verification:

kubectl logs -n mcp deployment/zai-proxy --tail=20 | grep "Token counting"
# Should see: "Token counting enabled (tiktoken cl100k_base encoding, model: glm-4)"
# Should NOT see: "fallback mode"

Issue 3: High Token Counting Latency (>5ms)

Symptoms:

  • Token counting is slow
  • zai_proxy_token_count_duration_seconds > 5ms

Diagnosis:

# Query 99th percentile latency
histogram_quantile(0.99, rate(zai_proxy_token_count_duration_seconds_bucket[5m]))

Expected: <1ms for 99th percentile Problem threshold: >5ms

Root Causes:

  1. High CPU usage

    kubectl top pod -n mcp -l app=zai-proxy
    

    If CPU is throttled (near limits):

    # Increase CPU limits
    resources:
      limits:
        cpu: 2000m  # Increase from 1000m
    
  2. High concurrent requests (mutex contention)

    # Check concurrent requests
    zai_proxy_concurrent_requests
    

    If >50 concurrent requests:

    # Increase MAX_WORKERS
    kubectl set env deployment/zai-proxy -n mcp MAX_WORKERS=100
    
  3. Large response bodies

    • Token counting processes entire response
    • Large outputs (>10K tokens) may take longer

    Workaround: Disable token counting for specific endpoints if needed

Verification:

# Check latency after changes
histogram_quantile(0.99, rate(zai_proxy_token_count_duration_seconds_bucket[5m]))

Issue 4: Token Usage Not in Response Body

Expected Behavior: Token counts are NOT included in response bodies (feature not implemented yet).

Current Behavior:

  • Token counts logged: Token usage: input=X, output=Y
  • Prometheus metrics: zai_proxy_tokens_total
  • No usage field in response JSON/SSE

Workaround 1: Use Logs

# Monitor token usage in real-time
kubectl logs -f -n mcp deployment/zai-proxy | grep "Token usage"

Workaround 2: Use Prometheus

# Total tokens per request (approximation)
rate(zai_proxy_tokens_total[5m]) / rate(zai_proxy_requests_total[5m])

Future Enhancement: Usage injection planned (see InjectTokenUsage() in tokenizer.go)


Issue 5: Metrics Not Reset After Deployment

Symptoms:

  • Token counts seem to persist across pod restarts
  • Metrics show large cumulative numbers

Explanation: Prometheus metrics are cumulative counters (lifetime totals).

Expected Behavior:

  • Counters reset to 0 when pod restarts
  • Use rate() or increase() for meaningful queries

Incorrect Query:

# DON'T: Raw counter (lifetime total since pod start)
zai_proxy_tokens_total

Correct Queries:

# Tokens per minute (rolling 5m window)
rate(zai_proxy_tokens_total[5m]) * 60

# Total tokens in last hour
increase(zai_proxy_tokens_total[1h])

# Tokens per request
rate(zai_proxy_tokens_total[5m]) / rate(zai_proxy_requests_total[5m])

Advanced Debugging

Enable Debug Logging

Modify deployment:

env:
- name: LOG_LEVEL
  value: "debug"  # Enable debug logs

Apply:

kubectl apply -f deployment.yaml
kubectl rollout status deployment/zai-proxy -n mcp

Watch logs:

kubectl logs -f -n mcp deployment/zai-proxy

Inspect Request/Response Bodies

Add temporary debug logging in code:

// In main.go, after CountRequestTokens()
log.Printf("DEBUG: Request body: %s", string(requestBody))
log.Printf("DEBUG: Input tokens: %d", inputTokens)

// After CountOutputTokens()
log.Printf("DEBUG: Response body: %s", string(bodyCapture.GetCapturedContent()))
log.Printf("DEBUG: Output tokens: %d", outputTokens)

Rebuild and deploy:

go build -o zai-proxy main.go tokenizer.go
docker build -t zai-proxy:debug .

⚠️ Warning: Only use in development. Large logs will impact performance.

Test Token Counting Locally

Create test request:

cat > test_request.json <<EOF
{
  "model": "claude-3-sonnet",
  "messages": [
    {"role": "user", "content": "What is 2+2?"}
  ],
  "max_tokens": 50
}
EOF

Run proxy locally:

export ZAI_API_KEY="your-api-key"
export TOKEN_COUNTING_ENABLED=true
go run main.go tokenizer.go

Send test request:

curl -X POST http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ZAI_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d @test_request.json

Check logs:

Token counting enabled (tiktoken cl100k_base encoding, model: glm-4)
Z.AI proxy listening on :8080
Token usage: input=8, output=5

Verify Tiktoken Installation

Check dependencies:

cd /home/coder/ardenone-cluster/containers/zai-proxy

# List tiktoken dependency
go list -m github.com/tiktoken-go/tokenizer

# Expected output:
# github.com/tiktoken-go/tokenizer v0.2.0

Test tiktoken directly:

cat > test_tiktoken.go <<EOF
package main

import (
    "fmt"
    "github.com/tiktoken-go/tokenizer"
)

func main() {
    enc, err := tokenizer.Get(tokenizer.Cl100kBase)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }

    text := "Hello, world!"
    ids, _, _ := enc.Encode(text)
    fmt.Printf("Text: %s\n", text)
    fmt.Printf("Tokens: %d\n", len(ids))
}
EOF

go run test_tiktoken.go

# Expected output:
# Text: Hello, world!
# Tokens: 4

Compare Token Counts with Anthropic

Send same request to both proxies:

# Request via zai-proxy
curl -X POST http://zai-proxy:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ZAI_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-3-sonnet","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}' \
  -o zai_response.json

# Request directly to Anthropic (if you have API key)
curl -X POST https://api.anthropic.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-3-sonnet-20240229","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}' \
  -o anthropic_response.json

# Compare usage fields
cat anthropic_response.json | jq '.usage'
# zai-proxy logs: Token usage: input=X, output=Y

Expected variance: <3% for Claude models, <10% for GLM-4


Diagnostic Checklist

Use this checklist when troubleshooting token counting issues:

  • Token counting enabled in environment variables
  • Startup logs show "tiktoken cl100k_base encoding" (not "fallback mode")
  • Prometheus metrics zai_proxy_tokens_total are present
  • "Token usage" logs appear for each request
  • Token counting latency <1ms (99th percentile)
  • Token counts within expected variance (<10%)
  • CPU usage not throttled
  • Sufficient MAX_WORKERS for concurrent requests

Getting Help

If you've tried all troubleshooting steps and still have issues:

  1. Collect diagnostic information:

    # Startup logs
    kubectl logs -n mcp deployment/zai-proxy --tail=100 > startup_logs.txt
    
    # Recent request logs
    kubectl logs -n mcp deployment/zai-proxy --tail=500 | grep "Token usage" > token_logs.txt
    
    # Configuration
    kubectl get deployment zai-proxy -n mcp -o yaml > deployment.yaml
    
    # Metrics snapshot
    curl -s http://zai-proxy.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_token > metrics.txt
    
  2. Check documentation:

  3. File an issue:

    • Include diagnostic files above
    • Describe expected vs actual behavior
    • Include relevant logs and metrics

Quick Reference Commands

Enable token counting:

kubectl set env deployment/zai-proxy -n mcp TOKEN_COUNTING_ENABLED=true
kubectl rollout restart deployment/zai-proxy -n mcp

Disable token counting:

kubectl set env deployment/zai-proxy -n mcp TOKEN_COUNTING_ENABLED=false
kubectl rollout restart deployment/zai-proxy -n mcp

Watch token usage logs:

kubectl logs -f -n mcp deployment/zai-proxy | grep "Token usage"

Query token metrics:

curl -s http://zai-proxy.mcp.svc.cluster.local:8080/metrics | grep zai_proxy_tokens

Check latency:

histogram_quantile(0.99, rate(zai_proxy_token_count_duration_seconds_bucket[5m]))

Restart proxy:

kubectl rollout restart deployment/zai-proxy -n mcp
kubectl rollout status deployment/zai-proxy -n mcp