Extracted from ardenone-cluster/containers/zai-proxy and ardenone-cluster/containers/zai-proxy-dashboard. - proxy/: OpenAI-compatible ZAI reverse proxy (Go, v1.10.0) - Token counting, rate limiting, Prometheus metrics, canary support - dashboard/: Metrics dashboard backend + React frontend (Go, v1.0.0) - Prometheus collector, SQLite storage, SSE live updates - docs/: Operational notes, research, and plan subdirs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
12 KiB
GLM-4 Tokenizer Libraries for Go - Research Findings
Research Date: 2026-02-08 Bead ID: bd-dv2 Purpose: Identify suitable Go libraries for GLM-4 tokenization for z.ai proxy integration
Executive Summary
GLM-4 uses a custom tokenizer based on HuggingFace's Tokenizers library with a vocabulary size of ~155,000 tokens. The tokenizer combines byte-level BPE encoding for Chinese/multilingual tokens merged with tiktoken's cl100k_base vocabulary.
Recommended Solution: Use github.com/daulet/tokenizers with HuggingFace pretrained models (zai-org/GLM-4.7-Flash)
GLM-4 Tokenizer Specifications
Model Information
- Model Repository:
zai-org/GLM-4.7-Flashon HuggingFace - Tokenizer Class:
PreTrainedTokenizer(HuggingFace format) - Vocabulary Size: ~155,000 tokens (154,820 base + 36 special tokens = 154,856 total)
- Max Context Length: 128,000 tokens
- Encoding Method: Byte-level BPE (Byte Pair Encoding)
- Special Tokens: 36 special tokens including
<|endoftext|>,[MASK],<|system|>,<|user|>,<|assistant|>, etc.
Tokenizer Configuration
{
"tokenizer_class": "PreTrainedTokenizer",
"model_max_length": 128000,
"clean_up_tokenization_spaces": false,
"do_lower_case": false,
"eos_token": "<|endoftext|>",
"pad_token": "<|endoftext|>",
"padding_side": "left"
}
Evaluated Libraries
1. github.com/daulet/tokenizers ⭐ RECOMMENDED
Repository: https://github.com/daulet/tokenizers Status: Active, well-maintained License: Apache 2.0
Pros
- ✅ Go bindings for HuggingFace Tokenizers (Rust-based)
- ✅ Supports
FromPretrained()to load models from HuggingFace Hub - ✅ Can load GLM-4.7 tokenizer:
tokenizers.FromPretrained("zai-org/GLM-4.7-Flash") - ✅ High performance (uses native Rust implementation)
- ✅ Supports encoding with options (attention mask, type IDs, offsets, etc.)
- ✅ Prebuilt binaries for linux-amd64, darwin-arm64, linux-arm64
- ✅ Actively maintained with recent releases
Cons
- ⚠️ Requires CGO (needs libtokenizers.a static library)
- ⚠️ More complex build process compared to pure Go libraries
- ⚠️ 3x slower than tiktoken for OpenAI models (but only option for GLM-4)
Installation
go get github.com/daulet/tokenizers
Usage Example
package main
import (
"fmt"
"github.com/daulet/tokenizers"
)
func main() {
// Load GLM-4.7 tokenizer from HuggingFace
tk, err := tokenizers.FromPretrained("zai-org/GLM-4.7-Flash")
if err != nil {
panic(err)
}
defer tk.Close()
// Get vocabulary size
fmt.Println("Vocab size:", tk.VocabSize())
// Output: Vocab size: 154856
// Encode text
ids, tokens := tk.Encode("你好,世界!Hello, world!", false)
fmt.Println("Token IDs:", ids)
fmt.Println("Tokens:", tokens)
// Encode with special tokens
idsWithSpecial, tokensWithSpecial := tk.Encode("你好,世界!", true)
fmt.Println("With special tokens:", idsWithSpecial, tokensWithSpecial)
// Decode tokens
text := tk.Decode(ids, true)
fmt.Println("Decoded:", text)
// Advanced encoding with options
encOpts := []tokenizers.EncodeOption{
tokenizers.WithReturnTypeIDs(),
tokenizers.WithReturnAttentionMask(),
tokenizers.WithReturnTokens(),
tokenizers.WithReturnOffsets(),
}
encoding := tk.EncodeWithOptions("Sample text", false, encOpts...)
fmt.Println("IDs:", encoding.IDs)
fmt.Println("Attention Mask:", encoding.AttentionMask)
fmt.Println("Tokens:", encoding.Tokens)
fmt.Println("Offsets:", encoding.Offsets)
}
Performance Benchmarks
BenchmarkEncodeNTimes-10 133966 10456 ns/op 256 B/op 12 allocs/op
BenchmarkDecodeNTimes-10 817164 1489 ns/op 64 B/op 2 allocs/op
Build Requirements
# Option 1: Use prebuilt binaries (recommended)
# Download from https://github.com/daulet/tokenizers/releases
# Extract libtokenizers.a to project directory
# Option 2: Build from source (requires Rust toolchain)
make build # Builds libtokenizers.a
# Set CGO flags
export CGO_LDFLAGS="-L./path/to/libtokenizers/directory"
2. github.com/gomlx/tokenizers ❌ NOT RECOMMENDED
Repository: https://github.com/gomlx/tokenizers Status: DEPRECATED - Moved to github.com/gomlx/go-huggingface
Notes
- Marked as "UNDER CONSTRUCTION" and "NOT FUNCTIONAL YET"
- Deprecated in favor of integrated solution in go-huggingface
- Not suitable for production use
3. tiktoken-go / go-tiktoken ❌ NOT COMPATIBLE
Repositories:
- https://github.com/pkoukk/tiktoken-go
- https://github.com/tiktoken-go/tokenizer
- https://github.com/j178/tiktoken-go
Status: Active, but NOT compatible with GLM-4
Why Not Compatible
- ❌ Tiktoken only supports OpenAI models (GPT-3.5, GPT-4, etc.)
- ❌ Uses different encoding schemes (cl100k_base, o200k_base)
- ❌ GLM-4 uses custom vocabulary and BPE rules
- ❌ Will produce incorrect token counts for GLM-4 models
Supported Encodings (OpenAI only)
gpt-3.5-turbogpt-4gpt-4-turbocl100k_base,p50k_base,r50k_base,o200k_base
4. ChatGLM API Wrappers (Not Tokenizer Libraries)
Repositories:
Status: API wrappers only, no tokenizer functionality
Notes
- These are REST API clients for ChatGLM service
- Do NOT provide local tokenization capabilities
- Not suitable for token counting in proxy middleware
Implementation Recommendation
Recommended Approach: daulet/tokenizers with HuggingFace Pretrained Models
package tokenizer
import (
"fmt"
"sync"
"github.com/daulet/tokenizers"
)
// GLM4Tokenizer wraps the HuggingFace tokenizer for GLM-4 models
type GLM4Tokenizer struct {
tk *tokenizers.Tokenizer
mu sync.RWMutex
name string
}
// NewGLM4Tokenizer creates a tokenizer for GLM-4.7 models
func NewGLM4Tokenizer() (*GLM4Tokenizer, error) {
tk, err := tokenizers.FromPretrained("zai-org/GLM-4.7-Flash")
if err != nil {
return nil, fmt.Errorf("failed to load GLM-4.7 tokenizer: %w", err)
}
return &GLM4Tokenizer{
tk: tk,
name: "GLM-4.7-Flash",
}, nil
}
// Close releases native resources
func (t *GLM4Tokenizer) Close() {
t.mu.Lock()
defer t.mu.Unlock()
if t.tk != nil {
t.tk.Close()
t.tk = nil
}
}
// CountTokens returns the number of tokens in the text
func (t *GLM4Tokenizer) CountTokens(text string) int {
t.mu.RLock()
defer t.mu.RUnlock()
ids, _ := t.tk.Encode(text, false)
return len(ids)
}
// CountTokensWithSpecial includes special tokens in the count
func (t *GLM4Tokenizer) CountTokensWithSpecial(text string) int {
t.mu.RLock()
defer t.mu.RUnlock()
ids, _ := t.tk.Encode(text, true)
return len(ids)
}
// Encode returns token IDs and token strings
func (t *GLM4Tokenizer) Encode(text string, addSpecialTokens bool) ([]uint32, []string) {
t.mu.RLock()
defer t.mu.RUnlock()
return t.tk.Encode(text, addSpecialTokens)
}
// Decode converts token IDs back to text
func (t *GLM4Tokenizer) Decode(tokenIDs []uint32) string {
t.mu.RLock()
defer t.mu.RUnlock()
return t.tk.Decode(tokenIDs, true)
}
// VocabSize returns the tokenizer vocabulary size
func (t *GLM4Tokenizer) VocabSize() uint {
t.mu.RLock()
defer t.mu.RUnlock()
return t.tk.VocabSize()
}
Integration with zai-proxy
package proxy
import (
"encoding/json"
"net/http"
)
// ChatRequest represents an OpenAI-compatible chat request
type ChatRequest struct {
Model string `json:"model"`
Messages []ChatMessage `json:"messages"`
Stream bool `json:"stream"`
}
type ChatMessage struct {
Role string `json:"role"`
Content string `json:"content"`
}
// TokenCounter middleware for z.ai proxy
func (p *Proxy) countRequestTokens(req *ChatRequest) (int, error) {
// Initialize tokenizer (cache this globally in production)
tk, err := NewGLM4Tokenizer()
if err != nil {
return 0, err
}
defer tk.Close()
totalTokens := 0
// Count tokens for each message
for _, msg := range req.Messages {
// Format: <|role|>\nContent\n
formatted := fmt.Sprintf("<|%s|>\n%s\n", msg.Role, msg.Content)
totalTokens += tk.CountTokens(formatted)
}
// Add tokens for response priming
totalTokens += 3 // <|assistant|> token overhead
return totalTokens, nil
}
Testing & Validation
Test Plan
- Accuracy Test: Compare token counts with official GLM-4 API
- Performance Test: Measure encoding/decoding latency
- Edge Cases: Test with multilingual text, special characters, empty strings
- Memory Test: Check for memory leaks in long-running processes
Validation Methodology
# Download test corpus
curl -o test_corpus.txt https://huggingface.co/datasets/wikitext/resolve/main/wikitext-2-raw-v1/test.txt
# Run token counting tests
go test -v ./tokenizer -run TestGLM4TokenCount
# Performance benchmarks
go test -bench=. ./tokenizer -benchmem -benchtime=10s
Dependencies & Build Setup
go.mod Dependencies
require (
github.com/daulet/tokenizers v1.0.0 // Check latest version
)
Docker Build Configuration
FROM golang:1.22-alpine AS builder
# Install build dependencies
RUN apk add --no-cache gcc musl-dev
# Download prebuilt libtokenizers.a
ADD https://github.com/daulet/tokenizers/releases/download/v1.0.0/libtokenizers-linux-amd64.a /lib/libtokenizers.a
# Set CGO flags
ENV CGO_ENABLED=1
ENV CGO_LDFLAGS="-L/lib"
# Build application
COPY . /app
WORKDIR /app
RUN go build -o zai-proxy .
FROM alpine:latest
COPY --from=builder /app/zai-proxy /usr/local/bin/
CMD ["/usr/local/bin/zai-proxy"]
Alternatives Considered & Rejected
| Library | Reason for Rejection |
|---|---|
| tiktoken-go | Only supports OpenAI models, incompatible with GLM-4 |
| go-tiktoken | Same as above, fork of tiktoken-go |
| gomlx/tokenizers | Deprecated, not functional |
| ChatGLM API wrappers | Only API clients, no local tokenization |
| Manual BPE implementation | Too complex, error-prone, slower than Rust bindings |
References
Documentation
Related Research
- ChatGLM: A Family of Large Language Models (arXiv)
- GLM-4.6 Documentation - Z.AI
- Counting Tokens in Go (Medium)
Next Steps
- ✅ Complete research and document findings
- ⏭️ Implement GLM4Tokenizer wrapper (bd-dv2 follow-up)
- ⏭️ Write unit tests for token counting accuracy
- ⏭️ Benchmark performance vs. Python tokenizer
- ⏭️ Integrate into zai-proxy middleware
- ⏭️ Deploy and validate with production traffic
Appendix: GLM-4 Special Tokens
<|endoftext|> # EOS token (ID: 154820)
[MASK] # Mask token
[gMASK] # Global mask
[sMASK] # Sentence mask
<sop> # Start of passage
<eop> # End of passage
<|system|> # System message role
<|user|> # User message role
<|assistant|> # Assistant message role
<|observation|> # Observation/tool result
<|begin_of_image|> # Image boundary
<|end_of_image|>
<|begin_of_video|> # Video boundary
<|end_of_video|>
<|begin_of_audio|> # Audio boundary
<|end_of_audio|>
<tool_call> # Function calling
</tool_call>
<tool_response>
</tool_response>
<think> # Chain-of-thought
</think>
Research Completed: 2026-02-08
Confidence Level: High - daulet/tokenizers is production-ready for GLM-4 tokenization