ai-code-battle/notes/bf-21081.md

2.3 KiB

Deploy P0: acb-postgres-credentials SealedSecret - ALREADY EXISTS

Status

COMPLETE (Pre-existing) - SealedSecret already existed

What Was Found

The acb-postgres-credentials SealedSecret was already created on 2026-05-26:

  • Commit: 2f40563 (feat(apexalgo-iad): add acb-postgres-credentials SealedSecret for ai-code-battle)
  • Repository: jedarden/declarative-config
  • File: k8s/apexalgo-iad/ai-code-battle/acb-postgres-sealedsecret.yml

The bead's premise was incorrect - the SealedSecret already exists and has been deployed.

Actual Blocker: Insufficient CPU

The deployments are NOT crashing due to missing secrets. All pods are stuck in Pending due to cluster capacity issues:

kubectl --server=http://traefik-apexalgo-iad:8001 get pods -n ai-code-battle
NAME                                 READY   STATUS    RESTARTS   AGE
acb-api-5646489f75-l4zmq             0/1     Pending   0          75m
acb-api-7c46c9d5b6-jfl9w             0/1     Pending   0          116m
acb-evolver-7654d8b866-psvk5         0/1     Pending   0          75m
acb-evolver-85549b574d-pqbjd         0/1     Pending   0          28h
acb-index-builder-6669fdbc95-nxwhf   0/1     Pending   0          86m
acb-map-evolver-79ff4cdf6c-7ghg4     0/1     Pending   0          86m
acb-matchmaker-64f6dc5985-vkbbl      0/1     Pending   0          86m
acb-worker-bf5bfdb98-g9jnn           0/1     Pending   0          86m
acb-worker-bf5bfdb98-mhvn6           0/1     Pending   0          86m

Cluster capacity:

  • 3 nodes total (2 Ready, 1 NotReady)
  • Node 1: CPU requests at 99% (3492m / ~3500m), memory at 27%
  • Node 2: CPU at 44% usage
  • Node 3: NotReady (unreachable)

Scheduling failure:

Warning  FailedScheduling  23m (x404 over 85m)   default-scheduler
0/3 nodes are available: 3 Insufficient cpu. preemption: 0/3 nodes are available:
3 No preemption victims found for incoming pod.

Root Cause

  1. One node NotReady - prod-instance-17781842321795040 status shows NotReady
  2. Insufficient CPU on ready nodes - all available CPU is allocated, new pods cannot schedule

Next Steps (Infrastructure Issue)

This is an infrastructure capacity problem, not a missing secret:

  1. Scale up cluster capacity or add nodes
  2. Fix or replace the NotReady node
  3. Once CPU is available, pods should schedule successfully (secret is already present)