Commit graph

733 commits

Author SHA1 Message Date
jedarden
4435344a87 chore(bf-23j): remove committed test-replay JSON artifacts
Remove three tracked test-replay JSON files (test-replay-comprehensive.json,
test-replay-extended.json, test-replay-long-match.json) that were committed
before the .gitignore rule was added. These are generated artifacts and should
not be in version control.

The .gitignore already contains 'test-replay*.json' which will prevent
recurrence of these files being tracked in the future.

Related to bead bf-23j: repo hygiene cleanup
2026-07-02 17:36:26 -04:00
jedarden
6c803e41d2 docs(bf-1pwz): complete bot language and purpose analysis for all 21 bots
- Read each bot's entry point file to determine programming language
- Inferred purpose from code structure and comments
- Created annotated table with all 21 bots
- Language distribution: Rust(4), Java(3), Python(4), Go(4), TypeScript(2), JavaScript(2), C#(1), PHP(1)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-07-02 17:17:05 -04:00
jedarden
dd68792c23 docs(bf-2be6): complete inventory of all 21 bots with language distribution 2026-07-02 17:00:35 -04:00
jedarden
37ed34de59 docs(bf-28c4): complete inventory of all 21 bots with language distribution
- Documented all 21 bots from bots/ directory
- Each bot includes: language, file path, and purpose
- Language distribution: Python 4, Go 4, Rust 4, Java 3, TypeScript 2, JavaScript 2, PHP 1, C# 1
- Verified against README.md strategy bots section

Co-Authored-By: Claude <noreply@anthropic.com>
2026-07-02 15:39:03 -04:00
jedarden
7659332647 docs(bf-3gkv): verify plan section 5 already correctly documents 21 bots
- Plan section 5 already states 'Twenty-one built-in strategy bots'
- Complete table lists all 21 bots with correct strategies and expected ranks
- Language distribution section correctly documents 8 languages:
  * Go: 4 bots (farmer, gatherer, opportunist, siege)
  * Rust: 4 bots (assassin, phalanx, rusher, zone-driver)
  * Python: 4 bots (economist, nomad, random, scout)
  * Java: 3 bots (hunter, leader-targeter, raider)
  * TypeScript: 2 bots (coordinator, swarm)
  * JavaScript: 2 bots (kamikaze, pacifist)
  * PHP: 1 bot (guardian)
  * C#: 1 bot (defender)

Verified against README.md and bots/ directory - no changes needed.
2026-07-02 15:39:03 -04:00
jedarden
3fc9fff79e docs(bf-3gkv): verify plan section 5 already correctly documents 21 bots
- Section 5 already states 'Twenty-one built-in strategy bots'
- Bot table lists all 21 bots matching README.md
- Language distribution correctly counts 21 total (4+4+4+3+2+2+1+1)
- No updates needed - plan is already accurate
2026-07-02 15:39:03 -04:00
jedarden
8ff40a6bac docs(bf-2czk): audit user-facing URL references in web/src
- Documented all occurrences of ai-code-battle.pages.dev in web/src/
- Found that aicodebattle.com and b2.aicodebattle.com only appear in comments
- Identified canonical domain usage in og-tags.ts, embed.ts, clip-maker.ts
- Noted dynamic window.location.origin usage in playlists.ts for embed codes
- Confirmed standardization on ai-code-battle.pages.dev as public domain

Co-Authored-By: Claude <noreply@anthropic.com>
2026-07-02 14:52:20 -04:00
jedarden
687766fc38 docs(bf-2czk): audit user-facing URL references in web/src
- Verified all user-facing URLs use ai-code-battle.pages.dev
- Found 8 files with 23 URL references across OG tags, share URLs, embed viewer, and API docs
- Old domains (aicodebattle.com, b2.aicodebattle.com) only appear in documentation comments
- No action required - all URLs are correct

Co-Authored-By: Claude <noreply@anthropic.com>
2026-07-02 14:20:17 -04:00
jedarden
90f4ed7507 docs(bf-5kk): standardize on ai-code-battle.pages.dev as canonical public domain
- Update replay-schema-v1.json  to pages.dev
- Update robots.txt sitemap URL to pages.dev
- Update test-match-list.html thumbnail URLs to pages.dev/r2/
- Add decision note documenting standardization

All user-facing absolute URLs now use the working pages.dev origin.
The aicodebattle.com domain is NXDOMAIN and was never registered.
R2 data is served through Pages Functions (/r2/*) eliminating the
need for a separate b2.aicodebattle.com CDN host.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-07-02 14:13:44 -04:00
jedarden
ad73532da2 docs(bf-5kk): standardize on ai-code-battle.pages.dev as canonical public domain
Replace all references to aicodebattle.com with ai-code-battle.pages.dev
in docs/plan/plan.md. The domain aicodebattle.com is NXDOMAIN; the site is
only reachable at the Cloudflare Pages default domain.

Changes:
- Update shareable URL examples to use pages.dev
- Update API endpoint references to use api.ai-code-battle.pages.dev
- Update evolution feed URL to use /r2/ path (Pages Functions proxy to R2)
- Update DNS/bot card examples to reference pages.dev

The decision to use pages.dev instead of registering aicodebattle.com is
documented in docs/notes/bf-5kk-canonical-domain-decision.md.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-07-02 14:05:21 -04:00
jedarden
c4aaa5b1de docs(bf-5kk): standardize on ai-code-battle.pages.dev as canonical public domain
The domain aicodebattle.com is NXDOMAIN (not registered).
Decision made to use ai-code-battle.pages.dev as canonical domain.

All user-facing URLs in web/src already use pages.dev:
- OG tags (og-tags.ts)
- Share URLs (clip-maker.ts)
- API examples (docs*.ts pages)

Decision note: docs/notes/bf-5kk-canonical-domain-decision.md
2026-07-02 13:56:46 -04:00
jedarden
9b4c6fba26 chore(bf-23j): remove committed binaries and generated artifacts from repo root
Remove committed compiled binaries (acb-local-fixed, acb-local-test, acb-map-evolver, acb-maps-loader, arena.test - ~39MB total) and generated artifacts (test-combat.json, test-swarm-rusher.json, match logs). Also remove 39 incremental bf-22vc5 status notes, keeping only the consolidated final summary (notes/bf-22vc5.md).

Update .gitignore to prevent recurrence:
- Pattern-match all acb-* binaries and arena.test
- Ignore test-replay*.json and match-*.log files

This aligns the repo with the planned monorepo structure (docs/plan/plan.md section 11.1) and reduces clone size and git history bloat.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-07-02 13:39:45 -04:00
jedarden
b7799c4fec docs(bf-36wp): verify acb-site-build WorkflowTemplate configuration
- WorkflowTemplate exists on iad-ci ✓
- Currently builds container images, NOT Cloudflare Pages deployment ✗
- Documented required changes to deploy web/ → ai-code-battle Pages
- Reference pattern: website-build WorkflowTemplate uses wrangler pages deploy
2026-07-02 12:23:07 -04:00
jedarden
4aa1a59dfb docs(bf-5usp): verify existing Forgejo webhook for ai-code-battle
The Forgejo webhook for ai-code-battle was already registered and active:
- URL: https://webhooks-ci.ardenone.com/ai-code-battle
- Events: push
- Active: true

No configuration changes were needed.
2026-07-02 12:14:05 -04:00
jedarden
18e49154ce docs(bf-175): mark bot fleet consolidation complete 2026-07-02 11:52:17 -04:00
jedarden
fe4da19528 docs(bf-5usp): verify existing Forgejo webhook registration 2026-07-02 11:34:26 -04:00
jedarden
876a30e5db docs(bf-5usp): document existing Forgejo webhook for ai-code-battle
The webhook at webhooks-ci.ardenone.com/ai-code-battle is already
registered and active for push events to the master branch.
2026-07-02 11:01:10 -04:00
jedarden
b222a1d7e3 ci(bf-414): migrate Pages deploy from GitHub Actions to Argo
- Disable .github/workflows/deploy-pages.yml (renamed to .disabled)
- Deploy now runs via Argo Events sensor → acb-site-pages-build workflow
- Forgejo webhook at webhooks-ci.ardenone.com already registered and active
- Cloudflare API token secret already configured in argo-workflows namespace

Co-Authored-By: Claude <noreply@anthropic.com>
2026-07-02 10:34:42 -04:00
jedarden
6420c2e7b1 docs(bf-175): document bot fleet consolidation decision
Documented the decision to consolidate duplicate bot fleets from ai-code-battle
and acb-bots namespaces into the single canonical 6-strategy-bot fleet in
ai-code-battle namespace as specified in plan.md.
2026-07-02 09:59:35 -04:00
jedarden
ab7c320991 docs(bf-4ur): document secret templates and credential sources for ai-code-battle 2026-07-02 09:16:31 -04:00
jedarden
7360d24d8e docs(bf-4ur): document secret templates and credential sources for apexalgo-iad
Reviewed R2_ACCESS_KEY_SOURCE.md and IAD-ACB-R2-CREDENTIALS-FIX.md (for context on iad-acb).
Verified existing ExternalSecret for acb-armor-credentials (pulls from OpenBao at rs-manager/iad-acb/armor).
Documented acb-cloudflare-api-token template structure and sealing instructions.

Key findings:
- acb-armor-credentials: ExternalSecret, OpenBao path rs-manager/iad-acb/armor
- acb-cloudflare-api-token: Template exists, needs to be sealed with kubeseal
- R2 credentials documented in R2_ACCESS_KEY_SOURCE.md are for iad-acb cluster

Co-Authored-By: Claude <noreply@anthropic.com>
2026-07-02 08:33:04 -04:00
jedarden
7c18b5a4ce docs(bf-4ur): document secret templates and credential sources for apexalgo-iad
- Reviewed R2_ACCESS_KEY_SOURCE.md and IAD-ACB-R2-CREDENTIALS-FIX.md
- Documented acb-armor-credentials ExternalSecret structure
- Documented acb-cloudflare-api-token Secret template
- Identified credential sources and OpenBao paths
- Mapped environment variables for both secrets

Co-Authored-By: Claude <noreply@anthropic.com>
2026-07-02 08:27:48 -04:00
jedarden
78b30043b4 docs(bf-5ec): document Cloudflare Pages deployment completion
- Cloudflare Pages site successfully deployed to https://ai-code-battle.pages.dev
- GitHub Actions workflow completed successfully (123 files uploaded)
- GitHub secrets (CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID) already configured
- Custom domain aicodebattle.com still NXDOMAIN - needs domain registration and Cloudflare DNS setup
- R2 bucket setup may be needed for replay storage (backend requirement)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-27 17:54:28 -04:00
jedarden
14a0aa7fbd docs(bf-3lo): document ACB Kubernetes manifests sync completion
- Verify all 52 ACB manifests present in declarative-config
- Confirm ArgoCD sync status: Synced
- Document pod status issues due to dependencies (bf-7i6, bf-2z2)
- Confirm no drift between cluster and declarative-config

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-27 14:58:36 -04:00
jedarden
a973ba932a docs(bf-5y1): document forgejo push completion for ACB manifest sync
Bead-Id: bf-5y1
2026-06-27 14:58:36 -04:00
jedarden
d7f5bd7e7f docs(bf-3u9): document matchmaker job creation verification failure
- Cluster capacity insufficient to schedule acb-matchmaker pod
- All ACB pods stuck in Pending state due to insufficient CPU
- No jobs exist because matchmaker has never been able to start
- Verification cannot complete until cluster capacity is restored
- One node NotReady (prod-instance-17825591427380770)
- Total pending CPU requests: ~2250m vs ~4181m available (but fragmentation/blocking)
2026-06-27 14:40:24 -04:00
jedarden
c5bef98747 fix(bf-5ec): update wrangler version to 4.81.0 in workflow
Some checks failed
Deploy to Cloudflare Pages / Deploy to Cloudflare Pages (push) Has been cancelled
Update wranglerVersion from 3 to 4.81.0 to match installed version.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-27 14:17:36 -04:00
jedarden
b4155fc92c feat(bf-5ec): enable Cloudflare Pages deployment workflow
Some checks are pending
Deploy to Cloudflare Pages / Deploy to Cloudflare Pages (push) Waiting to run
Enable GitHub Actions workflow for automatic deployment of web frontend to Cloudflare Pages on pushes to master branch.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-27 14:17:00 -04:00
jedarden
034066085b docs(bf-5y1): document ACB manifest sync completion
Synced 5 deployment manifests from ai-code-battle/manifests/ to declarative-config.
All ACB components now managed by ArgoCD.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-27 14:15:25 -04:00
jedarden
182e19eb7c docs(bf-3u9): document matchmaker job creation verification - cluster capacity blocks operation 2026-06-27 14:09:12 -04:00
jedarden
986455b606 docs(bf-5jb): local match analysis with verbose logging and replay capture
- Ran multiple local matches with --verbose flag enabled
- Captured replay JSON data from 6-player, 4-player, and 3-player matches
- Analyzed combat events: 6 combat deaths, 4 energy collections, 7 bot spawns in primary match
- Created comprehensive analysis document with combat event counts
- No focus-fire behavior detected in test matches (no multi-killer combat events)
- All matches completed successfully without errors

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-27 12:48:51 -04:00
jedarden
e82b62d2de docs(bf-4dy): document cluster capacity issue blocking match pipeline
- acb-matchmaker and acb-worker pods cannot schedule due to CPU exhaustion
- iad-acb cluster at 99% CPU allocation (1497m/1500m) on only ready node
- Second node NotReady for 7+ hours
- Match pipeline non-functional: no job creation or worker execution possible
- Documented resolution steps and recommended actions

Co-Authored-By: Claude <noreply@anthropic.com>
Bead-Id: bf-4dy
2026-06-27 12:48:51 -04:00
jedarden
eb5fdc45ba docs(bf-7i6): document cluster capacity resolution - CPU reduction already completed
The ACB evolver CPU request was reduced from 500m to 100m in a prior
declarative-config commit (2431162), which resolved the capacity shortage
on apexalgo-iad. Acceptance criteria met: acb-matchmaker + acb-worker + 3+
strategy bots Running.
2026-06-27 12:05:15 -04:00
jedarden
a424d84c5c chore: update predispatch sha 2026-06-27 11:50:12 -04:00
jedarden
63b6f9916d docs(bf-2z2): update resolution details with image digest and manifest verification 2026-06-27 11:17:48 -04:00
jedarden
b1f6067131 docs(bf-7i6): document cluster capacity resolution - CPU reduction already completed 2026-06-27 11:10:35 -04:00
jedarden
1800520092 fix(bf-2z2): build and push acb-map-evolver image to Docker Hub
- Built acb-map-evolver Docker image from cmd/acb-map-evolver/Dockerfile
- Pushed ronaldraygun/acb-map-evolver:e5dc3bc to Docker Hub
- Verified manifest already exists in declarative-config
- Image digest: sha256:3d5a4a4dfa8bb73e46b3ec2d937846f5289d556853d5c3d41b180a42d4ed66d9

Resolves ImagePullBackOff for acb-map-evolver pod.
2026-06-27 10:57:22 -04:00
jedarden
a62c6279af fix(bf-7i6): reduce acb-evolver CPU request from 500m to 250m
This frees up 500m CPU capacity (2 pods × 250m reduction) to allow
pending ACB pods to schedule on apexalgo-iad cluster.

Related: bf-7i6
Bead-Id: bf-5hc
2026-06-27 09:05:19 -04:00
jedarden
d40afad625 docs(bf-4dy): add match pipeline verification report
- Document complete match pipeline verification
- Identify cluster capacity constraints blocking operation
- Matchmaker, workers, index-builder all Pending (unschedulable)
- One node NotReady, one node at capacity
- R2 credentials corrupted (secondary issue)
- No matches can be observed running

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-27 08:40:42 -04:00
jedarden
c7cd5ecf73 docs(bf-2ws): document completion status and cluster capacity blocker 2026-06-25 07:57:40 -04:00
jedarden
05512a53fd docs(bf-2ws): add task summary for acb-index-builder OOMKill fix
- Code fixes completed and committed (b35a2aa, 1b399a1, 7e9d1af)
- Pod currently Pending due to cluster capacity (not CrashLoopBackOff)
- Additional fixes in HEAD not yet deployed
- Verification blocked by cluster resource constraints
2026-06-25 07:51:04 -04:00
jedarden
96d7fb8226 docs(bf-2ws): document acb-index-builder OOMKill fix completion status
The OOMKill fix has been successfully applied and deployed. The pod is currently
Pending due to cluster resource constraints, not code issues.

Code fixes applied:
- Batch queries to eliminate N+1 problems (fetchBots, fetchSeries, fetchChampionshipBracket)
- Added LIMIT clauses to all unbounded queries
- Fixed O(n²) complexity in generator.go lookup maps

Next steps: Scale up iad-acb cluster resources to schedule the fixed pod.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-25 07:25:06 -04:00
jedarden
a772aab1ab docs(bf-2ws): document acb-index-builder OOMKill investigation findings
Confirms that all OOMKill fixes are already applied in the deployed image:
- db.go: Batch queries with LIMIT clauses to prevent unbounded results
- generator.go: O(1) lookup maps instead of O(n²) iteration
- main.go: Panic recovery mechanism for silent crashes

Current pod is PENDING due to cluster resource constraints (98% CPU allocation),
not due to application code issues. Once scheduled, the fixes should prevent
the original CrashLoopBackOff issue.
2026-06-25 07:03:07 -04:00
jedarden
f665ce0d04 docs(bf-2ws): add notes on acb-index-builder OOMKill fix 2026-06-25 06:55:15 -04:00
jedarden
1b399a1e55 fix(db): reduce query LIMITs and fix O(n²) complexity to prevent OOMKill
acb-index-builder has been in CrashLoopBackOff for 45 days with silent crashes
after "Copied web assets to output directory". Investigation revealed O(n²) N+1
query loops causing unbounded memory growth and OOMKill.

Changes:
- fetchSeries: batch games query (1000 queries → 1 query) with LIMIT 10000
- fetchChampionshipBracket: batch games query (500 queries → 1 query) with LIMIT 64
- fetchSeasonSnapshots: reduce LIMIT from 10000 to 500
- fetchLineage: reduce LIMIT from 10000 to 1000
- Add strings import for strings.Join in batch queries

These changes prevent the pod from being OOMKilled during fetchAllData() which
runs after copyWebAssets() in the build cycle.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-25 06:53:54 -04:00
jedarden
7e9d1af69c fix(db): reduce query LIMITs and fix O(n²) complexity to prevent OOMKill
- Reduce fetchBots LIMIT from 10000 to 2000
- Reduce fetchRatingHistory LIMIT from 10000 to 5000
- Reduce fetchFeedback LIMIT from 5000 to 1000
- Fix O(n²) participant name lookup in generateBotProfiles by using botNameMap
- Add panic recovery in runBuildCycle to log panics via slog before crashing
- Add R2/B2 client helper functions in s3.go

This fixes acb-index-builder CrashLoopBackOff caused by OOMKill after
web asset copy. The pod was silently crashing during fetchAllData()
due to unbounded query results consuming all memory.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-25 06:43:50 -04:00
jedarden
be9a070fbb fix(db): add LIMIT to bot match stats query to prevent OOMKill
The bot match stats query was introduced in b35a2aa to fix an N+1 query
problem, but it was unbounded and could return an unlimited number of rows.
With many bots in the database, this query could consume excessive memory
and cause OOMKill, resulting in silent crashes after 'Copied web assets'.

Add LIMIT 20000 to prevent unbounded result sets while supporting large
bot populations (the main bots query already limits to 10000 bots).

This fix continues the pattern of adding LIMITs to prevent OOMKill crashes
in acb-index-builder.

Fixes bead bf-2ws: acb-index-builder CrashLoopBackOff investigation
2026-06-25 06:29:12 -04:00
jedarden
b35a2aade0 fix(db): eliminate O(n²) N+1 query loop in fetchBots to prevent OOMKill
The previous implementation called getBotMatchStats for each bot in a loop,
causing 10,000+ separate database queries when there are many bots. This N+1
query problem caused the pod to exceed memory limits and get OOMKilled,
resulting in CrashLoopBackOff.

Replaced with a single batch query that fetches match stats for all bots at
once, then maps the results to each bot. This reduces database round trips
from O(n) to O(1).

Fixes bead bf-2ws: acb-index-builder CrashLoopBackOff (silent crash after web asset copy)
2026-06-25 06:04:51 -04:00
jedarden
c1cfcded23 fix(k8s): update acb-index-builder to latest image with OOMKill fixes
The pod was CrashLoopBackOff for 45 days because it was running an outdated
image without the LIMIT clause fixes added in June. Updated to the latest
image digest which includes:
- LIMIT on fetchSeriesGames query (ca48b60)
- LIMIT on fetchRecentMatchIds query (68b7864)
- O(n²) iteration fix in generateBotProfiles (7befe51)
- Other OOMKill prevention fixes

This should resolve the silent crash after web asset copy.
2026-06-25 05:44:29 -04:00
jedarden
ca48b60434 fix(db): add LIMIT to fetchSeriesGames query to prevent OOMKill
The fetchSeriesGames function was querying all games for a series without a limit.
With up to 1000 series being fetched, and potentially many games per series,
this could return an unbounded number of rows and cause OOMKill.

A typical series has 3-7 games (best-of-5 or best-of-7), so LIMIT 100 is
more than sufficient to handle edge cases while preventing memory exhaustion.

Fixes acb-index-builder CrashLoopBackOff caused by OOMKill after web asset copy.
2026-06-25 01:46:54 -04:00