ai-code-battle/notes/bf-1bvca.md

2.2 KiB

bf-1bvca: combat_turns Migration

Task

Deploy P0: add combat_turns column migration to acb-schema-init (apexalgo-iad)

Status: COMPLETE

What Was Done

The combat_turns migration was already implemented in a previous session (commit 00e1f5c). Verified the following:

  1. Schema Changes (declarative-config/k8s/apexalgo-iad/ai-code-battle/acb-schema-init.yml):

    • Line 46: combat_turns INTEGER NOT NULL DEFAULT 0 in CREATE TABLE
    • Line 305: ALTER TABLE matches ADD COLUMN IF NOT EXISTS combat_turns INTEGER NOT NULL DEFAULT 0;
  2. Rollout Annotation: Bumped to v13-combat-turns-2026-06-03-bf-1bvca-sync (latest)

  3. Deployed: declarative-config is at v13, ArgoCD sync operation is Running (waiting for healthy state)

  4. Pushed: declarative-config is up to date with origin/main

Current State (Infrastructure Blockers)

The migration code is correct and committed. However, cluster resource constraints prevent verification:

  1. Cluster CPU Capacity: All application pods (api, index-builder, worker, evolver, matchmaker, etc.) are stuck Pending due to "Insufficient cpu"

    • Only schema-init pod (old v11 revision) is Running (1/1)
    • New v13 schema-init pod is Pending waiting for CPU
    • Cannot verify index-builder succeeds until pods can schedule
  2. Cluster Status (2026-06-04):

    • declarative-config: Commit d3e9eab (v13) pushed
    • Cluster apexalgo-iad: Stuck at v11-fix-secret-name-2026-06-03-bf-1bvca
    • Node CPU: 42%, 17%, 42% utilization but pods can't schedule
    • 10+ pods Pending with FailedScheduling: 0/3 nodes are available: 3 Insufficient cpu

Git History

Multiple commits to apply this migration:

  • 00e1f5c feat(apexalgo-iad): add acb-schema-init deployment with combat_turns migration
  • 5abffac fix(ai-code-battle): correct schema-init secret name reference
  • 6d7439d fix(acb-schema-init): bump checksum to force reapply combat_turns migration
  • And 8+ annotation bump commits attempting to force rollout

Next Steps

To complete verification:

  1. Fix postgres cluster (cnpg-apexalgo) - currently broken for 23 days
  2. Scale up cluster CPU or scale down workloads to free capacity
  3. Once index-builder pod runs, verify logs show no "combat_turns does not exist" errors