ai-code-battle/notes/bf-3u9.md

43 lines
1.6 KiB
Markdown

# Matchmaker Job Creation Verification (bf-3u9)
## Task
Verify matchmaker job creation by checking acb-matchmaker logs for successful job creation.
## Findings
### Cluster Status
The matchmaker deployment exists but is **not running** due to cluster capacity issues:
- **Matchmaker Pod**: `acb-matchmaker-64f6dc5985-9vh67` in namespace `ai-code-battle`
- **Status**: `Pending` (not running)
- **Age**: 35 minutes
### Root Cause
The matchmaker pod cannot be scheduled due to:
1. **Node Health Issues**:
- `prod-instance-17825591427380770`: `NotReady` (6h40m)
- Two nodes with `untolerated taint` (node.kubernetes.io/not-ready, node.kubernetes.io/unreachable)
2. **Resource Constraints**:
- `FailedScheduling` events show: `0/3 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated taint...`
- Multiple scheduling warnings over 35 minutes indicating ongoing capacity issues
### Expected Job Creation Log Format
When the matchmaker is running and creates jobs, it logs:
```
matchmaker: created %d-player match %s (seed=%s vs %v), job %s, map=%s
```
This log format is found in `cmd/acb-matchmaker/tickers.go:483`
## Conclusion
**Cannot verify job creation logs because the matchmaker is not running.** The pod is stuck in `Pending` state due to cluster capacity constraints and node health issues.
## Recommendations
1. Fix the NotReady node (`prod-instance-17825591427380770`)
2. Scale down non-critical workloads or add cluster capacity
3. Once matchmaker is running, verify job creation with:
```bash
kubectl --server=http://traefik-apexalgo-iad:8001 logs -n ai-code-battle deployment/acb-matchmaker | grep 'created.*player match'
```