zai-proxy/docs/notes/REGRESSION_TEST_QUICKREF.md
jedarden e7c24a0c08 feat: initial zai-proxy ecosystem repo
Extracted from ardenone-cluster/containers/zai-proxy and
ardenone-cluster/containers/zai-proxy-dashboard.

- proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0)
  - Token counting, rate limiting, Prometheus metrics, canary support
- dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0)
  - Prometheus collector, SQLite storage, SSE live updates
- docs/: Operational notes, research, and plan subdirs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:53:52 -04:00

6.1 KiB

Regression Test Quick Reference Card

🎯 Purpose

Prevent future breakage of token counting functionality by maintaining a comprehensive regression test suite.

📊 Status

  • Total Coverage: ~95%+ (Target: 90%+)
  • Regression Tests: 9 test functions, 38+ scenarios
  • Total Test Code: 2,609 lines across 4 test files

Quick Commands

Run Regression Tests

# All regression tests
go test -v -run "^TestRegression_" -timeout 30m

# Specific test
go test -v -run "TestRegression_BasicTokenCounts"

# With coverage
go test -v -cover -coverprofile=coverage.out -run "^TestRegression_"

# Automated runner (full suite + coverage report)
./tests/run_regression_tests.sh

Run in Docker (No Go Installed)

docker build -t zai-proxy:test .
docker run --rm zai-proxy:test go test -v -run "^TestRegression_"

📝 Adding a Test Case

1. Choose Category

Category When to Use Test Function
BasicTokenCounts Golden test cases with known good outputs TestRegression_BasicTokenCounts()
EdgeCases Edge cases that could crash or fail TestRegression_EdgeCases()
RequestParsing Request body parsing edge cases TestRegression_RequestParsing()
StreamingResponses SSE streaming token counting TestRegression_StreamingResponses()
JSONResponses Non-streaming response counting TestRegression_JSONResponses()
UsageInjection Token usage injection validation TestRegression_UsageInjection()
ConcurrentAccess Thread safety validation TestRegression_ConcurrentAccess()
FallbackCounter SimpleTokenCounter fallback TestRegression_FallbackCounter()
StreamingPreservation Streaming integrity TestRegression_StreamingPreservation()

2. Add Test Case

// In tokenizer_regression_test.go
// Find appropriate test function and add to test cases slice

{
    name:        "Short descriptive name",
    text:        "Input text to test",
    expectedMin: 5,   // -10% tolerance
    expectedMax: 10,  // +10% tolerance
    description: "Why this exists - BD-XYZ reference",
},

3. Validate

# Run your new test
go test -v -run "TestRegression_YourCategory/Short_descriptive_name"

# Check output, adjust expectedMin/expectedMax if needed

4. Commit

git add tokenizer_regression_test.go
git commit -m "test(bd-10d): Add regression test for [feature]

Prevents re-introduction of [bug/issue]. Expected: X-Y tokens.

Co-Authored-By: Claude Worker <noreply@anthropic.com>"
git push origin main

🧪 Test Case Template

Basic Token Count Test

{
    name:        "Technical documentation",
    text:        "The API endpoint returns a JSON response.",
    expectedMin: 7,
    expectedMax: 11,
    description: "Technical sentence - validated in BD-XYZ",
},

Edge Case Test

{
    name:        "Binary data",
    text:        "\x00\x01\x02\xff\xfe",
    shouldError: false,
    description: "Binary characters - must not crash",
},

Streaming Response Test

{
    name: "Code block stream",
    response: `data: {"type":"content_block_delta","delta":{"text":"def hello():\n"}}

data: {"type":"content_block_delta","delta":{"text":"    return 42\n"}}
`,
    expectedMin: 6,
    expectedMax: 12,
    description: "Code with formatting in streaming response",
},

📏 Expected Value Guidelines

Text Length Tolerance Example
<10 tokens ±1 token min: 4, max: 6 for ~5 tokens
10-100 tokens ±10% min: 45, max: 55 for ~50 tokens
>100 tokens ±15% min: 85, max: 115 for ~100 tokens

Best Practices

DO:

  • Use table-driven tests
  • Set realistic token ranges (not exact counts)
  • Include description with BD-XXX reference
  • Log success cases with t.Logf()
  • Validate errors are handled gracefully
  • Add test for every bug fix
  • Run tests before committing

DON'T:

  • Use exact token counts (brittle)
  • Ignore errors silently
  • Hardcode large text (use strings.Repeat())
  • Skip validation of expected values
  • Commit without running tests
  • Add tests without descriptions

🐛 Debugging Failed Tests

Token Count Out of Range

FAIL: Got 45 tokens, expected 38-42

Fix: Check actual output, adjust expectedMin/expectedMax if text/encoding changed

TikToken Not Available

Skipping regression tests: TikToken not available

Fix: Run go mod download && go mod tidy, rebuild

Race Condition

WARNING: DATA RACE

Fix: Run go test -race -run TestName to identify, add mutex protection


📂 File Structure

zai-proxy/
├── tokenizer.go                      # Implementation (294 lines)
├── tokenizer_regression_test.go      # Regression suite (712 lines) ← ADD TESTS HERE
├── tokenizer_test.go                 # Unit tests (565 lines)
├── main_test.go                      # Integration tests (499 lines)
├── comprehensive_tokenizer_tests.go  # End-to-end tests (533 lines)
├── tests/
│   ├── README.md                     # Test overview
│   ├── COVERAGE_REPORT.md            # Coverage metrics
│   └── run_regression_tests.sh       # Automated test runner
└── docs/
    ├── REGRESSION_TEST_GUIDE.md      # Complete guide
    └── REGRESSION_TEST_QUICKREF.md   # This file

📚 Documentation


🎯 Coverage Targets

Component Target Current Status
Token counting core 100% 100%
Request parsing 95%+ 98%
Response parsing 95%+ 97%
Edge cases 90%+ 95%
Overall 90%+ 95%+

Last Updated: 2026-02-08 Task: BD-10D - Create regression test suite Status: Complete (95%+ coverage achieved)