43 lines
1.6 KiB
Markdown
43 lines
1.6 KiB
Markdown
# Matchmaker Job Creation Verification (bf-3u9)
|
|
|
|
## Task
|
|
Verify matchmaker job creation by checking acb-matchmaker logs for successful job creation.
|
|
|
|
## Findings
|
|
|
|
### Cluster Status
|
|
The matchmaker deployment exists but is **not running** due to cluster capacity issues:
|
|
|
|
- **Matchmaker Pod**: `acb-matchmaker-64f6dc5985-9vh67` in namespace `ai-code-battle`
|
|
- **Status**: `Pending` (not running)
|
|
- **Age**: 35 minutes
|
|
|
|
### Root Cause
|
|
The matchmaker pod cannot be scheduled due to:
|
|
|
|
1. **Node Health Issues**:
|
|
- `prod-instance-17825591427380770`: `NotReady` (6h40m)
|
|
- Two nodes with `untolerated taint` (node.kubernetes.io/not-ready, node.kubernetes.io/unreachable)
|
|
|
|
2. **Resource Constraints**:
|
|
- `FailedScheduling` events show: `0/3 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated taint...`
|
|
- Multiple scheduling warnings over 35 minutes indicating ongoing capacity issues
|
|
|
|
### Expected Job Creation Log Format
|
|
When the matchmaker is running and creates jobs, it logs:
|
|
```
|
|
matchmaker: created %d-player match %s (seed=%s vs %v), job %s, map=%s
|
|
```
|
|
|
|
This log format is found in `cmd/acb-matchmaker/tickers.go:483`
|
|
|
|
## Conclusion
|
|
**Cannot verify job creation logs because the matchmaker is not running.** The pod is stuck in `Pending` state due to cluster capacity constraints and node health issues.
|
|
|
|
## Recommendations
|
|
1. Fix the NotReady node (`prod-instance-17825591427380770`)
|
|
2. Scale down non-critical workloads or add cluster capacity
|
|
3. Once matchmaker is running, verify job creation with:
|
|
```bash
|
|
kubectl --server=http://traefik-apexalgo-iad:8001 logs -n ai-code-battle deployment/acb-matchmaker | grep 'created.*player match'
|
|
```
|