ai-code-battle/notes/bf-3u9.md

1.6 KiB

Matchmaker Job Creation Verification (bf-3u9)

Task

Verify matchmaker job creation by checking acb-matchmaker logs for successful job creation.

Findings

Cluster Status

The matchmaker deployment exists but is not running due to cluster capacity issues:

  • Matchmaker Pod: acb-matchmaker-64f6dc5985-9vh67 in namespace ai-code-battle
  • Status: Pending (not running)
  • Age: 35 minutes

Root Cause

The matchmaker pod cannot be scheduled due to:

  1. Node Health Issues:

    • prod-instance-17825591427380770: NotReady (6h40m)
    • Two nodes with untolerated taint (node.kubernetes.io/not-ready, node.kubernetes.io/unreachable)
  2. Resource Constraints:

    • FailedScheduling events show: 0/3 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated taint...
    • Multiple scheduling warnings over 35 minutes indicating ongoing capacity issues

Expected Job Creation Log Format

When the matchmaker is running and creates jobs, it logs:

matchmaker: created %d-player match %s (seed=%s vs %v), job %s, map=%s

This log format is found in cmd/acb-matchmaker/tickers.go:483

Conclusion

Cannot verify job creation logs because the matchmaker is not running. The pod is stuck in Pending state due to cluster capacity constraints and node health issues.

Recommendations

  1. Fix the NotReady node (prod-instance-17825591427380770)
  2. Scale down non-critical workloads or add cluster capacity
  3. Once matchmaker is running, verify job creation with:
    kubectl --server=http://traefik-apexalgo-iad:8001 logs -n ai-code-battle deployment/acb-matchmaker | grep 'created.*player match'