ai-code-battle/docs/phase6-deployment-checklist.md
jedarden fb0ae2b603 docs(phase6): add deployment checklist and make scripts executable
- Add comprehensive Phase 6 deployment checklist (docs/phase6-deployment-checklist.md)
- Make all deployment scripts executable (chmod +x scripts/*.sh)
- Document remaining Cloudflare setup steps requiring account access
- Include verification commands and expected URLs
- Document data flow architecture

Phase 6 code is complete. Remaining infrastructure setup requires
Cloudflare account access for:
- Cloudflare Pages project creation
- R2 bucket creation and custom domain
- DNS configuration

All deployment scripts are ready to run once Cloudflare access is available.
2026-04-08 17:29:02 -04:00

14 KiB

Phase 6: Deployment & Production - Completion Checklist

Status: Code Complete, Infrastructure Setup Pending Cloudflare Access

This document outlines the remaining steps to complete Phase 6. All code is written and tested. The remaining tasks require Cloudflare account access to create resources.


Completed (Code & K8s)

Container Images

  • acb-matchmaker - Match scheduling, health checks, reaper
  • acb-worker - Match execution, B2 upload
  • acb-index-builder - PostgreSQL → JSON → Pages deploy, R2 management
  • acb-evolver - LLM evolution pipeline
  • acb-strategy-random - Python RandomBot
  • acb-strategy-gatherer - Go GathererBot
  • acb-strategy-rusher - Rust RusherBot
  • acb-strategy-guardian - PHP GuardianBot
  • acb-strategy-swarm - TypeScript SwarmBot
  • acb-strategy-hunter - Java HunterBot

Kubernetes Deployment

All K8s manifests are in the ardenone-cluster repo at: declarative-config/k8s/apexalgo-iad/ai-code-battle/

  • Namespace configuration
  • PostgreSQL schema (ext-postgres-operator)
  • Deployments for all services
  • Services for internal communication
  • SealedSecrets for credentials
  • ArgoCD Application manifest

CI/CD

  • GitHub Actions workflow (.github/workflows/ci.yml)
  • Go tests for engine and cmd packages
  • Web build with Vite
  • Build artifact upload

Monitoring & Alerting

  • Health endpoints (/health, /ready)
  • Prometheus metrics (/metrics)
  • Discord/Slack alerting webhooks
  • Liveness and readiness probes configured

Deployment Scripts

All scripts in scripts/ directory are ready:

  • cloudflare-setup.sh - Full Cloudflare setup
  • setup-r2.sh - R2 bucket + custom domain
  • deploy-pages.sh - Deploy SPA to Pages
  • configure-dns.sh - DNS configuration
  • verify-deployment.sh - End-to-end verification

Remaining (Requires Cloudflare Account Access)

Cloudflare Pages Setup

Automated via script:

./scripts/cloudflare-setup.sh

Or manual steps:

  1. Create Pages project:

    • Go to Workers & Pages > Create application > Pages > Upload assets
    • Project name: aicodebattle
    • Or use wrangler:
      wrangler pages project create aicodebattle --production-branch master
      
  2. Deploy the SPA:

    cd web
    npm install
    npm run build
    cd ..
    wrangler pages deploy web/dist --project-name=aicodebattle
    
  3. Add custom domain:

    • Go to: Workers & Pages > aicodebattle > Settings > Custom domains
    • Add domain: aicodebattle.com
    • DNS CNAME will be auto-configured

Cloudflare R2 Setup

Automated via script:

export CLOUDFLARE_API_TOKEN=your_token
export CLOUDFLARE_ACCOUNT_ID=your_account_id  # optional, auto-detected
./scripts/setup-r2.sh

Or manual steps:

  1. Create R2 bucket:

    wrangler r2 bucket create acb-data
    
  2. Add custom domain:

    • Go to: R2 > acb-data > Settings > Custom Domains
    • Add domain: r2.aicodebattle.com
    • DNS CNAME will be auto-configured

DNS Configuration

Automated via script:

export CLOUDFLARE_API_TOKEN=your_token
export TRAEFIK_IP=$(kubectl --server=http://kubectl-apexalgo-iad:8001 get svc -n traefik traefik -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
./scripts/configure-dns.sh

Or manual steps:

  1. Main domain (Pages):

    • Type: CNAME
    • Name: @ (or aicodebattle.com)
    • Target: aicodebattle.pages.dev
    • Proxy: On (orange cloud)
  2. R2 subdomain:

    • Type: CNAME
    • Name: r2
    • Target: acb-data.r2.cloudflarestorage.com
    • Proxy: Off (gray cloud) - DNS only
  3. API subdomain:

    • Type: A
    • Name: api
    • Target: <Traefik LoadBalancer IP>
    • Proxy: On (orange cloud)

Get Traefik IP:

kubectl --server=http://kubectl-apexalgo-iad:8001 get svc -n traefik

Verification

After completing the setup, run the verification script:

./scripts/verify-deployment.sh

Or manually check:

# SPA should be accessible
curl -I https://aicodebattle.com

# R2 should be accessible
curl -I https://r2.aicodebattle.com

# API health (once K8s is running)
curl https://api.aicodebattle.com/health

Expected URLs After Deployment

Service URL
SPA (Pages) https://aicodebattle.com
SPA (Pages default) https://aicodebattle.pages.dev
Replays (R2) https://r2.aicodebattle.com/replays/{match_id}.json.gz
Match metadata (R2) https://r2.aicodebattle.com/matches/{match_id}.json
Evolution feed (R2) https://r2.aicodebattle.com/evolution/live.json
API (K8s) https://api.aicodebattle.com/health

Data Flow

┌─────────────────────────────────────────────────────────────────┐
│                         Public Internet                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────┐    ┌─────────────────────────────────┐ │
│  │  Cloudflare Pages   │    │  Cloudflare R2                  │ │
│  │  aicodebattle.com    │    │  r2.aicodebattle.com            │ │
│  │                     │    │                                 │ │
│  │  SPA shell (HTML/   │    │  replays/*.json.gz              │ │
│  │  JS/CSS)            │    │  matches/*.json                 │ │
│  │  data/*.json        │    │  evolution/live.json            │ │
│  │                     │    │                                 │ │
│  └─────────────────────┘    └─────────────────────────────────┘ │
│           ▲                            ▲                        │
└───────────┼────────────────────────────┼────────────────────────┘
            │                            │
┌───────────┼────────────────────────────┼────────────────────────┐
│           │  apexalgo-iad cluster       │                        │
│           │                            │                        │
│  ┌────────▼─────────────────────────────┼────────────────────┐   │
│  │  Index Builder Deployment           │                    │   │
│  │  - Reads PostgreSQL                 │                    │   │
│  │  - Generates JSON indexes           │                    │   │
│  │  - Deploys to Pages (wrangler)      │                    │   │
│  │  - Promotes replays to R2           │                    │   │
│  │  - Prunes R2 warm cache             │                    │   │
│  └────────────────────────────────────────────────────────────┘   │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │  Match Workers (Deployment)                                │  │
│  │  - Execute matches                                          │  │
│  │  - Upload replays to B2                                     │  │
│  │  - Write results to PostgreSQL                             │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │  Matchmaker Deployment                                      │  │
│  │  - Creates match jobs                                       │  │
│  │  - Enqueues to Valkey                                       │  │
│  │  - Health checks bots                                       │  │
│  │  - Reaps stale jobs                                         │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │  Evolver Deployment                                         │  │
│  │  - LLM evolution pipeline                                   │  │
│  │  - Writes evolution/live.json to R2                         │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │  Strategy Bot Deployments (x6)                              │  │
│  │  - HTTP servers on cluster-internal Services                │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │  PostgreSQL (cnpg-apexalgo)                                 │  │
│  │  - Bots, matches, jobs, ratings, etc.                       │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │  Valkey StatefulSet                                          │  │
│  │  - Job queue (acb:jobs:pending)                              │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │  Backblaze B2 (cold archive)                                 │  │
│  │  - ALL replays, permanently                                  │  │
│  └─────────────────────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────────────────────┘

Post-Deployment Tasks

Once Cloudflare resources are created:

  1. Update environment variables in index builder:

    • CLOUDFLARE_API_TOKEN - For Pages deployment
    • R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET, R2_ENDPOINT - For R2 operations
    • B2_KEY_ID, B2_APPLICATION_KEY, B2_BUCKET, B2_ENDPOINT - For B2 operations
  2. Deploy to Kubernetes:

    • K8s manifests are already in ardenone-cluster repo
    • ArgoCD will sync them automatically
  3. Verify data flow:

    • Index builder should start deploying to Pages
    • Match workers should upload replays to B2
    • R2 warm cache should populate with recent replays
  4. Monitor:

    • Check ArgoCD for sync status
    • Check pod logs for any errors
    • Run ./scripts/verify-deployment.sh

Exit Criteria

Phase 6 is complete when:

  • All container images built and pushed
  • All K8s manifests committed to ardenone-cluster repo
  • CI/CD pipeline working
  • Monitoring and alerting configured
  • Cloudflare Pages project created and deployed
  • Cloudflare R2 bucket created with custom domain
  • DNS configured (aicodebattle.com, r2.aicodebattle.com, api.aicodebattle.com)
  • Platform publicly accessible

The final 3 items require Cloudflare account access and must be completed by someone with admin access to the Cloudflare account.