jedarden/ai-code-battle

Author	SHA1	Message	Date
jedarden	f35477dd96	feat(evolution, web): add live match counter per plan §16.18 - Add matches_today and active_bots fields to LiveData Totals (evolver) - Query matches table for COUNT() WHERE completed_at >= today - Query bots table for COUNT() WHERE status = 'active' - Add fields to index builder EvolutionMeta struct - Update homepage to render "X matches today · Y bots active · Gen #Z evolving" - Add CSS styling for .home-live-stats section Closes: bf-4m8mo	2026-05-26 19:57:57 -04:00
jedarden	1478a9365c	fix(evolver): use ConfigForPlayers for 2-player matches per plan §3.4 The evolver arena was using DefaultConfig() which has attack_radius2=12 for all matches. Per plan §3.4, 2-player matches should have attack_radius2=36 (6 tiles) to achieve 65-80% combat density. This bug caused evolved bots to learn energy-farming strategies since enemies were rarely in attack range on 40x40 maps with only 3.5 tile radius. With the correct 6-tile radius, bots will experience actual combat during evolution and should develop fighting behaviors. Closes: bf-3lt3 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-24 22:35:15 -04:00
jedarden	ea04f4debb	style: apply gofmt alignment fixes across codebase Tab/space alignment consistency from running gofmt on all packages. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-24 10:40:33 -04:00
jedarden	d3d655b9c9	Evolver: Fix nsjail integration for complete sandbox coverage - Add /opt to nsjail bindmounts so Rust toolchain (/opt/rust) is accessible during sandboxed validation of Rust bots - Explicitly enable Alpine community repository in Dockerfile to ensure nsjail package can be installed (nsjail lives in community, not main) - nsjail integration was already optional (falls back to plain exec if unavailable), but these changes ensure it actually works when enabled This addresses bead bf-3f29: nsjail was listed in apk add but /opt wasn't bindmounted, causing Rust validation to fail when UseNsjail=true. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 15:18:21 -04:00
jedarden	a4bdeba8fd	Phase 10: Live evolution observatory - evolver live.json feed + observatory page Evolver writes live.json to R2 every cycle. Observatory page polls and renders live feed + lineage tree + meta shift chart. - Added ACB_R2_UPLOAD_ENABLED env var to enable automatic R2 upload during run loop - CycleState tracks real-time evolution cycle status (generation, phase, candidate, validation, evaluation) - Export() now includes cycle info when cycleState is provided - runCycle() integrated with live observatory exports at each phase transition - exportLiveQuiet() for mid-cycle status updates without verbose logging - Fixed function signature mismatches for exportLiveQuiet calls Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-08 14:52:17 -04:00
jedarden	9c5eb57fdd	fix(evolver): correct GROUP BY in island stats query b.bot_id was selected without being in the GROUP BY clause or wrapped in an aggregate, causing a Postgres error on live export. Replaced with a correlated subquery that finds the highest-rated bot per island. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 07:04:37 -04:00
jedarden	0813e36297	fix(evolver): wire Nash mixture and meta weaknesses into LLM prompts, fix 4-D diversity - Add NashMixture and MetaWeaknesses fields to meta.Description and compute them from island population proportions (§10.2 PSRO) - Update behaviorDistance to support N-D vectors for 4-D MAP-Elites grid (aggression, economy, exploration, formation) - Wire NashMixture/MetaWeaknesses through FromMetaDescription converter so they actually reach the LLM prompt (was dead code before) - Align LLM prompt with plan §15.1/§15.5: correct combat rules (focus-fire), fog of war, HTTP protocol section, Nash mixture target - Fix diversity normalization from sqrt(2) (2-D) to 2.0 (4-D max) - Rename handleUIFeedback to handleCreateFeedback (§13.6 naming) - Update tests for new fields and corrected prompt text Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-23 01:22:19 -04:00
jedarden	60b83a02d9	feat(§15.3): implement screen reader transcript for replay viewer - Add transcript panel with turn-by-turn summaries generated from replay events - Each turn shows: player moves, combat, deaths, captures, energy collection, spawns, win probability - Add 'T' key shortcut to toggle transcript panel - Panel supports three view modes: All Turns, ±10 Turns from Current, Recent 20 Turns - Click on transcript entry to jump to that turn - Current turn is highlighted in transcript with smooth scroll - Panel content is selectable/copyable for screen reader users - Transcript generation logic already existed in replay-viewer.ts; this adds the UI - Transcript button slides in from right side of screen Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 18:42:49 -04:00
jedarden	38f14e1997	fix: remove unused imports in evolver, misc pre-dispatch changes Remove unused encoding/json and net/http imports from cmd/acb-evolver/run.go that caused build failure. Include other pre-dispatch changes from prior work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 18:32:46 -04:00
jedarden	88bd70640a	fix(types): add missing ReplayPlayer import and type annotation for transcript feature - Add ReplayPlayer to type imports in replay-viewer.ts - Add explicit type annotation for entry parameter in replay.ts transcript map - Fixes TypeScript compilation errors for §15.3 screen reader transcript feature	2026-04-22 18:20:56 -04:00
jedarden	6c1f031071	feat(config): add season_id + rules_version to Config per §4.2 - SeasonID and RulesVersion already present in engine/types.go Config struct - Worker already populates from active season row via DB join - Config embedded in VisibleState sent to bots each turn (including turn 0) - All starter kits (go, python, rust, java, csharp) already expose and log fields - Add season_id/rules_version logging to JavaScript starter on turn 0 - TypeScript Config interface already includes season_id and rules_version Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 18:09:26 -04:00
jedarden	f4352c6304	feat(evolver): add workflow completion polling to promoter Per plan §10.8 (deployment pipeline) and §9.8 (Argo Workflows): - Add waitForWorkflowCompletion() that polls Argo Workflow API - Add getWorkflowStatus() to fetch workflow phase/status - Update Promote() to wait for workflow completion before inserting bot record - Update Promote() to wait for K8s deployment readiness (waitForDeployment) - Update triggerArgoWorkflow() to return workflow name for polling - Add acb-evolved-bot-deploy-workflowtemplate.yml to manifests The promotion flow now: 1. Writes bot source to bots/evolved/<bot_name>/ 2. Commits and pushes source to git 3. Triggers Argo WorkflowTemplate 4. Waits for workflow completion (build + manifest commit) 5. Waits for K8s deployment to be ready 6. Inserts bot record into bots table 7. Updates programs table with bot_id/bot_name This ensures evolved bots have running containers before being marked active.	2026-04-22 17:46:33 -04:00
jedarden	477a54c548	feat(matchmaker): implement §6.1 Pareto skill-proximity + LRU pairing algorithm Replace random 2-player pairing with the full §6.1 algorithm: - Seed selection: bot with oldest last-match timestamp (tiebreak: lowest bot ID) - Format selection: seed's least-played player count among {2, 3, 4, 6} - Opponent selection: Pareto 80%/16-rank skill proximity + oldest last-pairing with seed + fewest 24h games for game-count balance - Map selection: least-recently-used active map for the chosen player count, with map_scores.last_used_at updated after each match - Random player slot assignment for all participant counts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 17:35:00 -04:00
jedarden	7a0de02059	feat(evolver): persist cross-pollination state to Postgres per §10.2 Add crosspoll_state table to persist per-island generation counters across evolver restarts. Load state on startup and save after each cross-pollination check. Add persistence pattern and translation structure tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 16:04:15 -04:00
jedarden	80334c6e34	feat(evolver): expand MAP-Elites from 2-D to 4-D grid per §10.2 - Add Exploration and Formation axis definitions with feature extraction from source code pattern matching (exploration/formation indicators) - Extend Grid key from (x,y) to (x,y,z,w) with 3⁴=81-cell behavior grid - Update bin assignment, promotion gate, and persistence (JSON snapshot) - Add Slice() for 2-D dashboard visualization across any axis pair - Migration: old 2-D archives project at z=middle, w=middle - Update cross-pollination to pad 2-element behavior vectors to 4 - Add Prometheus metrics to matchmaker (bot crashes, stale job count) - Add rivalry detection to index builder (data/meta/rivalries.json) - Web: batched bot list loading, leaderboard keyboard accessibility, improved ARIA attributes on match/playlist cards Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 15:44:39 -04:00
jedarden	e90d2e37c9	test(evolver): integration tests for cross-pollination logic per §10.2 Adds mock store/LLM implementations and tests for CheckAndPollinate: generation boundaries, fitness penalties, translation triggers, multi-boundary catch-up, and empty island handling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 15:26:18 -04:00
jedarden	c56cc8bae6	fix(matchmaker): multi-match crash cooldown (3 strikes / 30 min) per §4.5 + §6.1 Add crash_strikes and cooldown_until columns to bots table. Worker increments strikes on crash (cooldown at 3), resets on success. Matchmaker excludes cooldown bots from pairing, series scheduling, and championship brackets. Fix erroneous cooldown filter on series table in finalizeCompletedSeries (column only exists on bots). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 15:22:12 -04:00
jedarden	d43cf83471	feat(evolver): island cross-pollination every 50 generations per §10.2 Adds cross-pollination logic that copies the top program from each island to a random other island every 50 generations. When source and target islands use different languages, the LLM translates the code. Generation boundaries are tracked per-island to prevent duplicate events. - New crosspoll package with boundary detection, migration, and LLM translation - Added MaxGenerationByIsland DB query for generation counter tracking - Integrated into RunEvolutionLoop with observability logging - Tests for boundary logic, translation prompts, and target selection Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 15:13:27 -04:00
jedarden	d8812b98ee	feat(playlists/replay): n-player win prob, annotations, evolver metrics Playlist curation per §10 is fully implemented in the index builder: - generatePlaylists() writes /data/playlists/index.json and {slug}.json - curateWeeklyHighlights() selects best-of-week by upsets, elite clashes, marathon turns, and closest finishes (last 7 days) - persistGeneratedPlaylists() upserts to playlists/playlist_matches DB tables - /data/playlists/ stub files seeded for all 12 curated collections Replay viewer improvements shipped alongside: - WinProbPoint refactored from {p0,p1} to {probs: number[]} for N players - renderWinProbSparkline draws one line per player with matching colors - replay.ts updated to build probs[] from replay.win_prob arrays - Dynamic legend generated from replay.players instead of hardcoded P0/P1 New annotation overlay component (§16.8): - AnnotationOverlay: timeline track, per-turn list, canvas markers - createAnnotationForm: type selector, author, body, localStorage + API - ANNOTATION_OVERLAY_STYLES: self-contained CSS for the overlay Evolver: add mutations_per_hour metric to Totals (live.json §14) Types: consolidate evolution types into types.ts, re-export from api-types.ts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 17:15:31 -04:00
jedarden	91d807cec2	feat(web,cmd): enhance evolution dashboard, series/seasons pages, and matchmaker - Evolution page: live polling (10s), activity feed, candidate tracking, statistics section, island overview with live.json schema - Series page: detailed series view with game-by-game results - Seasons page: season list with status and champion display - Predictions page: enhanced prediction UI with open matches - API types: add CycleInfo, Candidate, ActivityEntry, Totals for live.json - Embed: improved embeddable replay widget - Mobile CSS: responsive breakpoints and bottom tab bar - Exporter: enhanced live.json generation with full cycle/candidate data - Matchmaker: series scheduling support with config - Worker: additional database queries for series/season data Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-21 13:42:20 -04:00
jedarden	5215cd7e57	fix(web): remove unused colors parameter from drawThreatLines call - Minor fix to match function signature - Add lazy loaders for feedback and docs-api pages	2026-04-21 09:02:22 -04:00
jedarden	4ba39e3aa8	feat(evolver): complete Phase 7 LLM-driven evolution implementation - Complete autonomous evolution pipeline with island model (4 islands) - MAP-Elites behavior grid integration for diversity - LLM ensemble integration (fast + strong model tiers) - 3-stage validation pipeline (syntax → schema → sandbox smoke test) - Evaluation arena (10-match mini-tournament per candidate) - Promotion gate (Nash equilibrium PSRO + MAP-Elites niche fill) - Retirement policy (auto-retire low-rated bots, population cap) - Live export to R2 for evolution dashboard - Enhanced replay viewer with commentary and win probability - Added series, seasons, and predictions pages All tests passing. Phase 7 exit criteria met. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-08 16:38:48 -04:00
jedarden	f3e34c6736	fix(evolver): correct failing tests for ensemble and behavior distance - Fixed TestSelectBestCandidate_GoHttpBonus: HTTP bonus (1.5x) on 150-char code (225 score) doesn't beat 500-char plain text (500 score). Test now expects the longer code to win. - Fixed TestScoreCandidate_Bonuses: adjusted minScore expectations to match actual code lengths with 1.5x bonus applied. - Fixed TestBehaviorDistance: use epsilon comparison for floating-point precision instead of exact equality. sqrt(2) ≈ 1.414214 is not exactly representable in floating-point. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-08 16:36:50 -04:00
jedarden	f5924e8b15	feat(acb-evolver): add LLM prompt builder and ensemble integration - Add parent sampling via tournament selection (selector/tournament.go) - Add replay analyzer to extract key moments, strategies, weaknesses - Add meta builder for leaderboard summary and dominant strategies - Add prompt assembler combining parent code + replay + meta context - Add LLM ensemble with fast tier (GLM-5-Turbo) for bulk generation and strong tier (GLM-5) for refinement passes - Add code extraction from LLM responses with language validation - Add convert utilities for type conversion between packages - Comprehensive test coverage for all components Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-29 16:47:25 -04:00
jedarden	3d9326d767	feat(acb-evolver): add CRUD operations for programs database with island model Add Delete, List, ListTopByIsland, and GetLineage methods to the programs Store. These complete the CRUD operations needed for the evolution pipeline: - Delete: Remove programs by ID - List: Paginated listing of all programs - ListTopByIsland: Get top N programs by fitness for a specific island - GetLineage: Recursively traverse parent chain for lineage tracking Also adds comprehensive tests for all new operations including lineage tracking through grandparent-parent-child chains. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-29 12:08:21 -04:00
jedarden	1523c52e0a	Add R2 upload for live evolution observatory (Phase 10) - Add R2 client module (cmd/acb-evolver/internal/live/r2.go) with S3-compatible uploads to Cloudflare R2 - UploadLiveJSON() uploads evolution state to evolution/live.json with Cache-Control: max-age=10 for near-real-time updates - Add -r2 and -r2-only flags to live-export subcommand - Add tests for R2 config validation and credential handling - Update frontend to fetch live data from R2 URL instead of Pages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-29 04:53:35 -04:00
jedarden	f5d7553f98	Add Phase 7-9 features: evolution dashboard, WASM sandbox, enhanced replay Phase 7 Evolution: - Add live-export subcommand to acb-evolver for dashboard JSON generation - Export programs, stats, and generation log to live.json Phase 8 Enhanced Features: - Add WASM game engine build (cmd/acb-wasm/) with JS bindings - Add in-browser sandbox page with Monaco editor (web/src/pages/sandbox.ts) - Add win probability computation (web/src/win-probability.ts) - Add replay commentary generator (web/src/commentary.ts) - Add clip maker for GIF/MP4 export (web/src/pages/clip-maker.ts) - Add rivalry detection and pages (web/src/pages/rivalries.ts) - Add replay feedback system (web/src/pages/feedback.ts) - Add evolution dashboard page (web/src/pages/evolution.ts) Phase 9 Platform Depth: - Add predictions API (cmd/acb-api/predictions.go) - Add series management API (cmd/acb-api/series.go) - Add seasons API (cmd/acb-api/seasons.go) - Add narrative generator for rivalries (cmd/acb-indexer/src/narrative.ts) Engine Updates: - Add debug field to move response schema - Add match event timeline extraction - Add replay enrichment fields Web Updates: - Update app.html navigation for new pages - Add API client methods for predictions, series, seasons - Export engine types for browser use Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-29 01:13:23 -04:00
jedarden	76e8791e4d	Add evaluation arena, promotion gate, and retirement policy (Phase 7) - arena/arena.go: 10-match mini-tournament running candidate as a local subprocess against diverse live opponents sampled across the rating distribution; AES-GCM secret decryption for opponent auth - arena/psro.go: Nash equilibrium computation for the 1×K meta-game; FictitiousPlayNash included for future K×K support - arena/winrate.go: Wilson-score 95% CI for win-rate calculation; draws counted as 0.5 wins - arena/gate.go: two-part promotion gate — Nash value ≥ threshold AND MAP-Elites niche fill or improvement; detailed reason strings - promoter/promoter.go: full promotion pipeline — bot source + Dockerfile + K8s Secret/Deployment/Service manifests, docker build, git commit/push (ArgoCD sync), kubectl readiness poll, bots-table INSERT, programs-table update; RetireBot and EnforcePolicy (rating threshold + population cap 50) - db/db.go: add bot_name / bot_secret migration columns - db/programs.go: ListPromoted, SetBotNameAndSecret, UnsetPromoted, GetByBotID, PromotedCount helpers for promotion/retirement lifecycle - main.go: evaluate and retire subcommands wiring arena + gate + promoter; remove unused island flag from evaluate - arena/arena_test.go: 21 unit tests covering Nash, Wilson CI, Gate logic, and selectDiverse opponent sampling - promoter/promoter_test.go: tests for Dockerfiles, bot-ID/secret generation, AES-GCM helpers, and K8s manifest templates Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 23:32:37 -04:00
jedarden	5669688984	Add validation pipeline, sandbox, and evolution DB layer (Phase 7) Three-stage fail-fast validator for LLM-generated bot candidates: - syntax.go: language-aware parse (go/parser for Go; py_compile, rustfmt, tsc, javac, php -l for others; brace-balance fallback) - schema.go: regex detection of /health + /turn endpoints and "moves" field - sandbox.go: nsjail-isolated smoke test — builds bot, polls /health, sends 5 signed /turn requests, verifies JSON moves responses - validator.go: orchestrates stages with fail-fast short-circuit DB layer: - programs table + CRUD (create, get, list, updateFitness, setPromoted) - validation_log table with RecordValidation, IslandPassRates, IslandValidationStats for per-island pass-rate tracking - seed.go: 6 generation-0 bots across alpha/beta/gamma/delta islands MAP-Elites grid (mapelites/grid.go): 2-D behavior grid on aggression×economy axes; TryPlace keeps the fittest occupant per niche. acb-evolver CLI gains two new subcommands: validate <file> -lang <lang> [-island <island>] [-nsjail] [-nolog] validation-stats (tabular per-island pass-rate breakdown) cmd/acb-api/db.go: add programs table to API schema so the API can query promoted evolved bots. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 22:45:13 -04:00
jedarden	bd4b0d3244	Add LLM prompt builder and ensemble integration (Phase 7) - selector: tournament selection for parent sampling from island populations - prompt: assembles evolution prompts from parent code, replay analysis, and meta description - llm: OpenAI-compatible client routing to ZAI proxy with fast (GLM-5-Turbo) and strong (GLM-5) tiers, plus code block extraction from model responses - Tests for prompt assembly, code extraction, and tournament selection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 22:26:09 -04:00

30 commits