From 182e19eb7c85db18225be6be12f2b56d31808316 Mon Sep 17 00:00:00 2001 From: jedarden Date: Sat, 27 Jun 2026 14:08:56 -0400 Subject: [PATCH] docs(bf-3u9): document matchmaker job creation verification - cluster capacity blocks operation --- notes/bf-3u9.md | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 notes/bf-3u9.md diff --git a/notes/bf-3u9.md b/notes/bf-3u9.md new file mode 100644 index 0000000..a6a9bef --- /dev/null +++ b/notes/bf-3u9.md @@ -0,0 +1,43 @@ +# Matchmaker Job Creation Verification (bf-3u9) + +## Task +Verify matchmaker job creation by checking acb-matchmaker logs for successful job creation. + +## Findings + +### Cluster Status +The matchmaker deployment exists but is **not running** due to cluster capacity issues: + +- **Matchmaker Pod**: `acb-matchmaker-64f6dc5985-9vh67` in namespace `ai-code-battle` +- **Status**: `Pending` (not running) +- **Age**: 35 minutes + +### Root Cause +The matchmaker pod cannot be scheduled due to: + +1. **Node Health Issues**: + - `prod-instance-17825591427380770`: `NotReady` (6h40m) + - Two nodes with `untolerated taint` (node.kubernetes.io/not-ready, node.kubernetes.io/unreachable) + +2. **Resource Constraints**: + - `FailedScheduling` events show: `0/3 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated taint...` + - Multiple scheduling warnings over 35 minutes indicating ongoing capacity issues + +### Expected Job Creation Log Format +When the matchmaker is running and creates jobs, it logs: +``` +matchmaker: created %d-player match %s (seed=%s vs %v), job %s, map=%s +``` + +This log format is found in `cmd/acb-matchmaker/tickers.go:483` + +## Conclusion +**Cannot verify job creation logs because the matchmaker is not running.** The pod is stuck in `Pending` state due to cluster capacity constraints and node health issues. + +## Recommendations +1. Fix the NotReady node (`prod-instance-17825591427380770`) +2. Scale down non-critical workloads or add cluster capacity +3. Once matchmaker is running, verify job creation with: + ```bash + kubectl --server=http://traefik-apexalgo-iad:8001 logs -n ai-code-battle deployment/acb-matchmaker | grep 'created.*player match' + ```