Commit graph

33 commits

Author SHA1 Message Date
jedarden
a4bdeba8fd Phase 10: Live evolution observatory - evolver live.json feed + observatory page
Evolver writes live.json to R2 every cycle. Observatory page polls and
renders live feed + lineage tree + meta shift chart.

- Added ACB_R2_UPLOAD_ENABLED env var to enable automatic R2 upload during run loop
- CycleState tracks real-time evolution cycle status (generation, phase, candidate, validation, evaluation)
- Export() now includes cycle info when cycleState is provided
- runCycle() integrated with live observatory exports at each phase transition
- exportLiveQuiet() for mid-cycle status updates without verbose logging
- Fixed function signature mismatches for exportLiveQuiet calls

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-08 14:52:17 -04:00
jedarden
5242d6037c feat(acb-evolver): add weekly automated map evolution ticker
Wire up the acb-map-evolver to run automatically on a weekly schedule
(Sunday 03:00 UTC by default) from the evolver deployment.

The map evolution ticker:
- Waits until the next scheduled time (weekday:hour:minute UTC)
- Runs acb-map-evolver --once to evolve maps for all player counts
- Repeats every 7 days

The schedule can be configured via ACB_MAP_EVOLUTION_SCHEDULE env var
(format: WEEKDAY:HH:MM, e.g., "0:03:00" for Sunday 03:00 UTC).

Enable via ACB_MAP_EVOLUTION_ENABLED=true or --enable-map-evolution flag.

Per plan §14.6: the weekly map evolution loads engagement scores,
runs MAP-Elites evolution, promotes high-scoring variants, and updates
the active map pool in the database.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-08 09:26:38 -04:00
jedarden
b31c306013 feat(acb-map-evolver): add weekly automated run wiring per plan §14.6
- Implement runWeeklyLoop() function that waits for scheduled time and
  runs evolution for all player counts (2, 3, 4, 6) weekly
- Add --weekly flag to enable weekly mode (default: Sunday 03:00 UTC)
- Add --weekly-schedule flag for custom schedule (WEEKDAY:HH:MM format)
- Add ACB_WEEKLY_SCHEDULE env var for configuration

feat(acb-evolver): add weekly map evolution ticker

- Add MapEvolutionEnabled and MapEvolutionSchedule to RunConfig
- Add --enable-map-evolution flag to acb-evolver run subcommand
- Add startMapEvolutionTicker() goroutine that runs weekly
- Ticker executes acb-map-evolver --once to trigger map breeding
- Configurable via ACB_MAP_EVOLUTION_ENABLED and ACB_MAP_EVOLUTION_SCHEDULE

This integrates map evolution into the bot evolver's deployment,
allowing weekly automated map evolution based on engagement scores
as specified in plan §14.6.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-08 09:15:19 -04:00
jedarden
9c5eb57fdd fix(evolver): correct GROUP BY in island stats query
b.bot_id was selected without being in the GROUP BY clause or wrapped
in an aggregate, causing a Postgres error on live export. Replaced with
a correlated subquery that finds the highest-rated bot per island.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 07:04:37 -04:00
jedarden
d0087a3241 fix(docker): add COPY metrics/ to all service Dockerfiles
The metrics package is a local module dependency imported by all services
but was missing from every Dockerfile's build context.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 18:00:16 -04:00
jedarden
0813e36297 fix(evolver): wire Nash mixture and meta weaknesses into LLM prompts, fix 4-D diversity
- Add NashMixture and MetaWeaknesses fields to meta.Description and
  compute them from island population proportions (§10.2 PSRO)
- Update behaviorDistance to support N-D vectors for 4-D MAP-Elites
  grid (aggression, economy, exploration, formation)
- Wire NashMixture/MetaWeaknesses through FromMetaDescription converter
  so they actually reach the LLM prompt (was dead code before)
- Align LLM prompt with plan §15.1/§15.5: correct combat rules
  (focus-fire), fog of war, HTTP protocol section, Nash mixture target
- Fix diversity normalization from sqrt(2) (2-D) to 2.0 (4-D max)
- Rename handleUIFeedback to handleCreateFeedback (§13.6 naming)
- Update tests for new fields and corrected prompt text

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-23 01:22:19 -04:00
jedarden
60b83a02d9 feat(§15.3): implement screen reader transcript for replay viewer
- Add transcript panel with turn-by-turn summaries generated from replay events
- Each turn shows: player moves, combat, deaths, captures, energy collection, spawns, win probability
- Add 'T' key shortcut to toggle transcript panel
- Panel supports three view modes: All Turns, ±10 Turns from Current, Recent 20 Turns
- Click on transcript entry to jump to that turn
- Current turn is highlighted in transcript with smooth scroll
- Panel content is selectable/copyable for screen reader users
- Transcript generation logic already existed in replay-viewer.ts; this adds the UI
- Transcript button slides in from right side of screen

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 18:42:49 -04:00
jedarden
38f14e1997 fix: remove unused imports in evolver, misc pre-dispatch changes
Remove unused encoding/json and net/http imports from cmd/acb-evolver/run.go
that caused build failure. Include other pre-dispatch changes from prior work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 18:32:46 -04:00
jedarden
98a9f645c4 feat(evolver): update retirement ticker interval to daily (§10.8)
Changed RetirementCheckInterval from 1 hour to 24 hours to align
with the 7-day low-rating rule specified in §10.8. The retirement
automation is already fully implemented:

- startRetirementTicker: runs periodic checks (now daily)
- EnforcePolicy: retires bots below rating threshold (800) for 7
  consecutive days, enforces 50-bot population cap
- queryConsecutiveLowRating: uses rating_history table to track
  consecutive days below threshold
- RetireBot: handles K8s manifest deletion via declarative-config
- TestEnforcePolicy_CapEnforcement: integration test for cap enforcement

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 18:23:03 -04:00
jedarden
88bd70640a fix(types): add missing ReplayPlayer import and type annotation for transcript feature
- Add ReplayPlayer to type imports in replay-viewer.ts
- Add explicit type annotation for entry parameter in replay.ts transcript map
- Fixes TypeScript compilation errors for §15.3 screen reader transcript feature
2026-04-22 18:20:56 -04:00
jedarden
6c1f031071 feat(config): add season_id + rules_version to Config per §4.2
- SeasonID and RulesVersion already present in engine/types.go Config struct
- Worker already populates from active season row via DB join
- Config embedded in VisibleState sent to bots each turn (including turn 0)
- All starter kits (go, python, rust, java, csharp) already expose and log fields
- Add season_id/rules_version logging to JavaScript starter on turn 0
- TypeScript Config interface already includes season_id and rules_version

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 18:09:26 -04:00
jedarden
f4352c6304 feat(evolver): add workflow completion polling to promoter
Per plan §10.8 (deployment pipeline) and §9.8 (Argo Workflows):

- Add waitForWorkflowCompletion() that polls Argo Workflow API
- Add getWorkflowStatus() to fetch workflow phase/status
- Update Promote() to wait for workflow completion before inserting bot record
- Update Promote() to wait for K8s deployment readiness (waitForDeployment)
- Update triggerArgoWorkflow() to return workflow name for polling
- Add acb-evolved-bot-deploy-workflowtemplate.yml to manifests

The promotion flow now:
1. Writes bot source to bots/evolved/<bot_name>/
2. Commits and pushes source to git
3. Triggers Argo WorkflowTemplate
4. Waits for workflow completion (build + manifest commit)
5. Waits for K8s deployment to be ready
6. Inserts bot record into bots table
7. Updates programs table with bot_id/bot_name

This ensures evolved bots have running containers before being marked active.
2026-04-22 17:46:33 -04:00
jedarden
477a54c548 feat(matchmaker): implement §6.1 Pareto skill-proximity + LRU pairing algorithm
Replace random 2-player pairing with the full §6.1 algorithm:
- Seed selection: bot with oldest last-match timestamp (tiebreak: lowest bot ID)
- Format selection: seed's least-played player count among {2, 3, 4, 6}
- Opponent selection: Pareto 80%/16-rank skill proximity + oldest last-pairing
  with seed + fewest 24h games for game-count balance
- Map selection: least-recently-used active map for the chosen player count,
  with map_scores.last_used_at updated after each match
- Random player slot assignment for all participant counts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 17:35:00 -04:00
jedarden
7f2407ed00 feat: add Prometheus metrics instrumentation across services
Add metrics server startup and HTTP middleware to acb-api, generation
counter metric to evolver, and R2 cache size metric to index builder.

Also remove dead measureR2CacheSize reference from index builder main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 16:16:03 -04:00
jedarden
7a0de02059 feat(evolver): persist cross-pollination state to Postgres per §10.2
Add crosspoll_state table to persist per-island generation counters
across evolver restarts. Load state on startup and save after each
cross-pollination check. Add persistence pattern and translation
structure tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 16:04:15 -04:00
jedarden
80334c6e34 feat(evolver): expand MAP-Elites from 2-D to 4-D grid per §10.2
- Add Exploration and Formation axis definitions with feature extraction
  from source code pattern matching (exploration/formation indicators)
- Extend Grid key from (x,y) to (x,y,z,w) with 3⁴=81-cell behavior grid
- Update bin assignment, promotion gate, and persistence (JSON snapshot)
- Add Slice() for 2-D dashboard visualization across any axis pair
- Migration: old 2-D archives project at z=middle, w=middle
- Update cross-pollination to pad 2-element behavior vectors to 4
- Add Prometheus metrics to matchmaker (bot crashes, stale job count)
- Add rivalry detection to index builder (data/meta/rivalries.json)
- Web: batched bot list loading, leaderboard keyboard accessibility,
  improved ARIA attributes on match/playlist cards

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 15:44:39 -04:00
jedarden
e90d2e37c9 test(evolver): integration tests for cross-pollination logic per §10.2
Adds mock store/LLM implementations and tests for CheckAndPollinate:
generation boundaries, fitness penalties, translation triggers,
multi-boundary catch-up, and empty island handling.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 15:26:18 -04:00
jedarden
c56cc8bae6 fix(matchmaker): multi-match crash cooldown (3 strikes / 30 min) per §4.5 + §6.1
Add crash_strikes and cooldown_until columns to bots table. Worker
increments strikes on crash (cooldown at 3), resets on success.
Matchmaker excludes cooldown bots from pairing, series scheduling,
and championship brackets. Fix erroneous cooldown filter on series
table in finalizeCompletedSeries (column only exists on bots).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 15:22:12 -04:00
jedarden
d43cf83471 feat(evolver): island cross-pollination every 50 generations per §10.2
Adds cross-pollination logic that copies the top program from each island
to a random other island every 50 generations. When source and target
islands use different languages, the LLM translates the code. Generation
boundaries are tracked per-island to prevent duplicate events.

- New crosspoll package with boundary detection, migration, and LLM translation
- Added MaxGenerationByIsland DB query for generation counter tracking
- Integrated into RunEvolutionLoop with observability logging
- Tests for boundary logic, translation prompts, and target selection

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 15:13:27 -04:00
jedarden
d8812b98ee feat(playlists/replay): n-player win prob, annotations, evolver metrics
Playlist curation per §10 is fully implemented in the index builder:
- generatePlaylists() writes /data/playlists/index.json and {slug}.json
- curateWeeklyHighlights() selects best-of-week by upsets, elite
  clashes, marathon turns, and closest finishes (last 7 days)
- persistGeneratedPlaylists() upserts to playlists/playlist_matches DB tables
- /data/playlists/ stub files seeded for all 12 curated collections

Replay viewer improvements shipped alongside:
- WinProbPoint refactored from {p0,p1} to {probs: number[]} for N players
- renderWinProbSparkline draws one line per player with matching colors
- replay.ts updated to build probs[] from replay.win_prob arrays
- Dynamic legend generated from replay.players instead of hardcoded P0/P1

New annotation overlay component (§16.8):
- AnnotationOverlay: timeline track, per-turn list, canvas markers
- createAnnotationForm: type selector, author, body, localStorage + API
- ANNOTATION_OVERLAY_STYLES: self-contained CSS for the overlay

Evolver: add mutations_per_hour metric to Totals (live.json §14)
Types: consolidate evolution types into types.ts, re-export from api-types.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 17:15:31 -04:00
jedarden
91d807cec2 feat(web,cmd): enhance evolution dashboard, series/seasons pages, and matchmaker
- Evolution page: live polling (10s), activity feed, candidate tracking,
  statistics section, island overview with live.json schema
- Series page: detailed series view with game-by-game results
- Seasons page: season list with status and champion display
- Predictions page: enhanced prediction UI with open matches
- API types: add CycleInfo, Candidate, ActivityEntry, Totals for live.json
- Embed: improved embeddable replay widget
- Mobile CSS: responsive breakpoints and bottom tab bar
- Exporter: enhanced live.json generation with full cycle/candidate data
- Matchmaker: series scheduling support with config
- Worker: additional database queries for series/season data

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 13:42:20 -04:00
jedarden
5215cd7e57 fix(web): remove unused colors parameter from drawThreatLines call
- Minor fix to match function signature
- Add lazy loaders for feedback and docs-api pages
2026-04-21 09:02:22 -04:00
jedarden
3b8bd66575 feat(acb-evolver): add Dockerfile for evolver container with multi-language runtime support
Installs Python 3, Node.js/TypeScript for bot validation sandbox.
Base image includes Go; Java/Rust/PHP validation is deferred to follow-up bead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 08:41:10 -04:00
jedarden
4ba39e3aa8 feat(evolver): complete Phase 7 LLM-driven evolution implementation
- Complete autonomous evolution pipeline with island model (4 islands)
- MAP-Elites behavior grid integration for diversity
- LLM ensemble integration (fast + strong model tiers)
- 3-stage validation pipeline (syntax → schema → sandbox smoke test)
- Evaluation arena (10-match mini-tournament per candidate)
- Promotion gate (Nash equilibrium PSRO + MAP-Elites niche fill)
- Retirement policy (auto-retire low-rated bots, population cap)
- Live export to R2 for evolution dashboard
- Enhanced replay viewer with commentary and win probability
- Added series, seasons, and predictions pages

All tests passing. Phase 7 exit criteria met.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 16:38:48 -04:00
jedarden
f3e34c6736 fix(evolver): correct failing tests for ensemble and behavior distance
- Fixed TestSelectBestCandidate_GoHttpBonus: HTTP bonus (1.5x) on 150-char code
  (225 score) doesn't beat 500-char plain text (500 score). Test now expects
  the longer code to win.
- Fixed TestScoreCandidate_Bonuses: adjusted minScore expectations to match
  actual code lengths with 1.5x bonus applied.
- Fixed TestBehaviorDistance: use epsilon comparison for floating-point
  precision instead of exact equality. sqrt(2) ≈ 1.414214 is not exactly
  representable in floating-point.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 16:36:50 -04:00
jedarden
b191751c2b feat(acb-evolver): add evolve CLI command for LLM-based bot generation
Add the 'evolve' subcommand that ties together the LLM prompt builder
and ensemble components:

- Load programs from target island
- Select parents via tournament selection
- Analyze optional replay files for strategic context
- Build meta description from current ladder state
- Assemble evolution prompt with all context
- Run LLM ensemble (fast tier + strong tier refinement)
- Output generated bot code

Usage: acb-evolver evolve -island alpha -lang go [-replay file.json] [-out file.go]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 19:14:17 -04:00
jedarden
f5924e8b15 feat(acb-evolver): add LLM prompt builder and ensemble integration
- Add parent sampling via tournament selection (selector/tournament.go)
- Add replay analyzer to extract key moments, strategies, weaknesses
- Add meta builder for leaderboard summary and dominant strategies
- Add prompt assembler combining parent code + replay + meta context
- Add LLM ensemble with fast tier (GLM-5-Turbo) for bulk generation
  and strong tier (GLM-5) for refinement passes
- Add code extraction from LLM responses with language validation
- Add convert utilities for type conversion between packages
- Comprehensive test coverage for all components

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 16:47:25 -04:00
jedarden
3d9326d767 feat(acb-evolver): add CRUD operations for programs database with island model
Add Delete, List, ListTopByIsland, and GetLineage methods to the programs
Store. These complete the CRUD operations needed for the evolution pipeline:

- Delete: Remove programs by ID
- List: Paginated listing of all programs
- ListTopByIsland: Get top N programs by fitness for a specific island
- GetLineage: Recursively traverse parent chain for lineage tracking

Also adds comprehensive tests for all new operations including lineage
tracking through grandparent-parent-child chains.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 12:08:21 -04:00
jedarden
1523c52e0a Add R2 upload for live evolution observatory (Phase 10)
- Add R2 client module (cmd/acb-evolver/internal/live/r2.go) with
  S3-compatible uploads to Cloudflare R2
- UploadLiveJSON() uploads evolution state to evolution/live.json
  with Cache-Control: max-age=10 for near-real-time updates
- Add -r2 and -r2-only flags to live-export subcommand
- Add tests for R2 config validation and credential handling
- Update frontend to fetch live data from R2 URL instead of Pages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 04:53:35 -04:00
jedarden
f5d7553f98 Add Phase 7-9 features: evolution dashboard, WASM sandbox, enhanced replay
Phase 7 Evolution:
- Add live-export subcommand to acb-evolver for dashboard JSON generation
- Export programs, stats, and generation log to live.json

Phase 8 Enhanced Features:
- Add WASM game engine build (cmd/acb-wasm/) with JS bindings
- Add in-browser sandbox page with Monaco editor (web/src/pages/sandbox.ts)
- Add win probability computation (web/src/win-probability.ts)
- Add replay commentary generator (web/src/commentary.ts)
- Add clip maker for GIF/MP4 export (web/src/pages/clip-maker.ts)
- Add rivalry detection and pages (web/src/pages/rivalries.ts)
- Add replay feedback system (web/src/pages/feedback.ts)
- Add evolution dashboard page (web/src/pages/evolution.ts)

Phase 9 Platform Depth:
- Add predictions API (cmd/acb-api/predictions.go)
- Add series management API (cmd/acb-api/series.go)
- Add seasons API (cmd/acb-api/seasons.go)
- Add narrative generator for rivalries (cmd/acb-indexer/src/narrative.ts)

Engine Updates:
- Add debug field to move response schema
- Add match event timeline extraction
- Add replay enrichment fields

Web Updates:
- Update app.html navigation for new pages
- Add API client methods for predictions, series, seasons
- Export engine types for browser use

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 01:13:23 -04:00
jedarden
76e8791e4d Add evaluation arena, promotion gate, and retirement policy (Phase 7)
- arena/arena.go: 10-match mini-tournament running candidate as a local
  subprocess against diverse live opponents sampled across the rating
  distribution; AES-GCM secret decryption for opponent auth
- arena/psro.go: Nash equilibrium computation for the 1×K meta-game;
  FictitiousPlayNash included for future K×K support
- arena/winrate.go: Wilson-score 95% CI for win-rate calculation; draws
  counted as 0.5 wins
- arena/gate.go: two-part promotion gate — Nash value ≥ threshold AND
  MAP-Elites niche fill or improvement; detailed reason strings
- promoter/promoter.go: full promotion pipeline — bot source + Dockerfile
  + K8s Secret/Deployment/Service manifests, docker build, git commit/push
  (ArgoCD sync), kubectl readiness poll, bots-table INSERT, programs-table
  update; RetireBot and EnforcePolicy (rating threshold + population cap 50)
- db/db.go: add bot_name / bot_secret migration columns
- db/programs.go: ListPromoted, SetBotNameAndSecret, UnsetPromoted,
  GetByBotID, PromotedCount helpers for promotion/retirement lifecycle
- main.go: evaluate and retire subcommands wiring arena + gate + promoter;
  remove unused island flag from evaluate
- arena/arena_test.go: 21 unit tests covering Nash, Wilson CI, Gate logic,
  and selectDiverse opponent sampling
- promoter/promoter_test.go: tests for Dockerfiles, bot-ID/secret generation,
  AES-GCM helpers, and K8s manifest templates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:32:37 -04:00
jedarden
5669688984 Add validation pipeline, sandbox, and evolution DB layer (Phase 7)
Three-stage fail-fast validator for LLM-generated bot candidates:
- syntax.go: language-aware parse (go/parser for Go; py_compile, rustfmt,
  tsc, javac, php -l for others; brace-balance fallback)
- schema.go: regex detection of /health + /turn endpoints and "moves" field
- sandbox.go: nsjail-isolated smoke test — builds bot, polls /health, sends
  5 signed /turn requests, verifies JSON moves responses
- validator.go: orchestrates stages with fail-fast short-circuit

DB layer:
- programs table + CRUD (create, get, list, updateFitness, setPromoted)
- validation_log table with RecordValidation, IslandPassRates,
  IslandValidationStats for per-island pass-rate tracking
- seed.go: 6 generation-0 bots across alpha/beta/gamma/delta islands

MAP-Elites grid (mapelites/grid.go): 2-D behavior grid on aggression×economy
axes; TryPlace keeps the fittest occupant per niche.

acb-evolver CLI gains two new subcommands:
  validate <file> -lang <lang> [-island <island>] [-nsjail] [-nolog]
  validation-stats (tabular per-island pass-rate breakdown)

cmd/acb-api/db.go: add programs table to API schema so the API can query
promoted evolved bots.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:45:13 -04:00
jedarden
bd4b0d3244 Add LLM prompt builder and ensemble integration (Phase 7)
- selector: tournament selection for parent sampling from island populations
- prompt: assembles evolution prompts from parent code, replay analysis, and meta description
- llm: OpenAI-compatible client routing to ZAI proxy with fast (GLM-5-Turbo) and strong (GLM-5) tiers, plus code block extraction from model responses
- Tests for prompt assembly, code extraction, and tournament selection

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:26:09 -04:00