diff --git a/.needle-predispatch-sha b/.needle-predispatch-sha index 7c0fcbb..7418065 100644 --- a/.needle-predispatch-sha +++ b/.needle-predispatch-sha @@ -1 +1 @@ -508dc0c2e89849e9c383ec27150cdfd446368c52 +42e9561e462943ba99c6060c5158944083976f08 diff --git a/.wrangler/cache/wrangler-account.json b/.wrangler/cache/wrangler-account.json new file mode 100644 index 0000000..0fa26e5 --- /dev/null +++ b/.wrangler/cache/wrangler-account.json @@ -0,0 +1,6 @@ +{ + "account": { + "id": "e26f015c7ba47a6ad6219385e77072b7", + "name": "" + } +} \ No newline at end of file diff --git a/IAD-ACB-OPENBAO-FIX.md b/IAD-ACB-OPENBAO-FIX.md new file mode 100644 index 0000000..36ea802 --- /dev/null +++ b/IAD-ACB-OPENBAO-FIX.md @@ -0,0 +1,190 @@ +# iad-acb Cluster Secret Issues - Comprehensive Fix + +## Summary + +Two separate issues affecting iad-acb cluster secrets: + +1. **Orphaned openbao namespace** (RESOLVED) - Was causing DNS conflicts for ESO +2. **Corrupted R2 credentials in OpenBao** (ACTIVE) - R2 operations failing + +--- + +## Issue 1: Orphaned openbao Namespace (RESOLVED) + +### Problem +An orphaned `openbao` namespace existed on iad-acb containing a sealed local OpenBao deployment. This caused DNS conflicts where ESO would sometimes resolve to the local service (HTTP 503) instead of the correct Tailscale egress proxy. + +### Status +**RESOLVED** - The orphaned namespace has been deleted. + +### Verification +```bash +kubectl --kubeconfig=/home/coding/.kube/iad-acb.kubeconfig get namespace openbao +# Output: Error from server (NotFound) - namespace is gone +``` + +--- + +## Issue 2: Corrupted R2 Credentials in OpenBao (ACTIVE) + +### Problem +The `acb-r2-credentials` ExternalSecret on iad-acb is syncing corrupted values from OpenBao. + +**Current Secret Values (corrupted):** +| Secret Key | Current Value | Expected Value | +|------------|---------------|----------------| +| `endpoint` | `bdaf818e893d8691d2ff24bf1c120d34458a00be8d12b5b74037f930b20cabcd` | `https://e26f015c7ba47a6ad6219385e77072b7.r2.cloudflarestorage.com` | +| `bucket` | `acb-data` | `acb-data` ✓ | +| `access-key` | `66aabf3cc401c74755910422a903a8af` | (R2 Access Key ID - 32 chars) | +| `secret-key` | `https://e26f015c7ba47a6ad6219385e77072b7.r2.cloudflarestorage.com` | (R2 Secret Access Key - 64 chars) | + +**Note:** The values are swapped - the endpoint URL is stored in the `secret-key` field! + +### Impact + +All R2 operations fail with "Custom endpoint was not a valid URI": +- Replay uploads to R2 fail (index-builder, worker) +- Thumbnail uploads to R2 fail +- Bot card uploads to R2 fail +- Website replay viewer cannot load real matches + +**Evidence from index-builder logs:** +``` +"error":"upload to R2: upload object data/meta/archetypes.json: operation error S3: PutObject, resolve auth scheme: resolve endpoint: endpoint rule error, Custom endpoint `bdaf818e893d8691d2ff24bf1c120d34458a00be8d12b5b74037f930b20cabcd` was not a valid URI" +``` + +### Root Cause +The values stored in OpenBao at `secret/rs-manager/ai-code-battle/r2` are corrupted. This is **not an ESO sync issue** - ESO is correctly syncing whatever values are stored in OpenBao. + +--- + +## Fix Options + +### Option 1: Fix the OpenBao Secret (Recommended) + +1. Access OpenBao on rs-manager cluster +2. Update the secret at `secret/rs-manager/ai-code-battle/r2` with correct values: + +```bash +# Via OpenBao CLI +vault login +vault kv put secret/rs-manager/ai-code-battle/r2 \ + endpoint="https://e26f015c7ba47a6ad6219385e77072b7.r2.cloudflarestorage.com" \ + bucket="acb-data" \ + access-key="" \ + secret-key="" +``` + +3. Force ESO to re-sync: +```bash +kubectl --kubeconfig=/home/coding/.kube/iad-acb.kubeconfig annotate externalsecret acb-r2-credentials -n ai-code-battle force-sync=$(date +%s) +``` + +### Option 2: Replace with SealedSecret (Bypass ESO) + +1. Generate R2 API credentials in Cloudflare dashboard (R2 > acb-data > Settings > R2 API) +2. Create SealedSecret with correct values: + +```bash +kubectl create secret generic acb-r2-credentials -n ai-code-battle \ + --from-literal=endpoint="https://e26f015c7ba47a6ad6219385e77072b7.r2.cloudflarestorage.com" \ + --from-literal=bucket="acb-data" \ + --from-literal=access-key="" \ + --from-literal=secret-key="" \ + --dry-run=client -o yaml | \ +kubeseal --controller-name=sealed-secrets -n ai-code-battle \ + > /home/coding/declarative-config/k8s/iad-acb/ai-code-battle/acb-r2-credentials-sealedsecret.yml +``` + +3. Remove the ExternalSecret from declarative-config: +```bash +# Remove from /home/coding/declarative-config/k8s/iad-acb/ai-code-battle/acb-externalsecrets.yml +# Delete the acb-r2-credentials ExternalSecret section +``` + +4. Delete the ExternalSecret from the cluster: +```bash +kubectl --kubeconfig=/home/coding/.kube/iad-acb.kubeconfig delete externalsecret acb-r2-credentials -n ai-code-battle +``` + +### Option 3: Automated Fix Script + +Run the provided fix script: +```bash +/home/coding/ai-code-battle/fix-iad-acb-r2-credentials.sh +``` + +The script supports: +- Updating OpenBao directly (with OpenBao root token) +- Creating a SealedSecret (bypasses OpenBao) + +--- + +## Required R2 Credentials + +To fix this, you need: + +1. **R2 Access Key ID** (32 characters, starts with digits) +2. **R2 Secret Access Key** (64 characters) + +**Get these from Cloudflare Dashboard:** +1. Go to: R2 > acb-data > Settings > R2 API +2. Click "Create API Token" or use existing token +3. Copy Access Key ID and Secret Access Key + +--- + +## Verification + +After applying the fix, verify: + +```bash +# Check secret values +kubectl --kubeconfig=/home/coding/.kube/iad-acb.kubeconfig get secret acb-r2-credentials -n ai-code-battle -o json | jq -r '.data | map_values(@base64d)' + +# Expected output: +# { +# "access-key": "<32-char access key>", +# "bucket": "acb-data", +# "endpoint": "https://e26f015c7ba47a6ad6219385e77072b7.r2.cloudflarestorage.com", +# "secret-key": "<64-char secret key>" +# } + +# Check index-builder logs for R2 errors (should be gone) +kubectl --kubeconfig=/home/coding/.kube/iad-acb.kubeconfig logs -n ai-code-battle -l app.kubernetes.io/name=acb-index-builder --tail=50 | grep -i r2 + +# Check pod is healthy +kubectl --kubeconfig=/home/coding/.kube/iad-acb.kubeconfig get pods -n ai-code-battle -l app.kubernetes.io/name=acb-index-builder +``` + +--- + +## ClusterSecretStore Configuration + +The ClusterSecretStore in `/home/coding/declarative-config/k8s/iad-acb/external-secrets/cluster-secret-store.yml` is correctly configured: + +```yaml +spec: + provider: + vault: + server: "http://openbao.external-secrets.svc.cluster.local:8200" + path: "secret" + version: "v2" + auth: + kubernetes: + mountPath: "k8s-iad-acb" + role: "eso" + serviceAccountRef: + name: external-secrets-iad-acb + namespace: external-secrets +``` + +**Status:** Ready and validated + +--- + +## Files + +- `/home/coding/ai-code-battle/fix-iad-acb-r2-credentials.sh` - Automated fix script +- `/home/coding/ai-code-battle/IAD-ACB-R2-CREDENTIALS-FIX.md` - R2-specific fix documentation +- `/home/coding/declarative-config/k8s/iad-acb/ai-code-battle/acb-externalsecrets.yml` - ExternalSecret definitions diff --git a/IAD-ACB-R2-CREDENTIALS-FIX.md b/IAD-ACB-R2-CREDENTIALS-FIX.md new file mode 100644 index 0000000..5be898f --- /dev/null +++ b/IAD-ACB-R2-CREDENTIALS-FIX.md @@ -0,0 +1,100 @@ +# iad-acb R2 Credentials Fix + +## Problem + +The `acb-r2-credentials` ExternalSecret on iad-acb is syncing values from OpenBao, but the stored values are **corrupted/swapped**: + +| Secret Key | Current Value | Expected Value | +|------------|---------------|----------------| +| `endpoint` | `bdaf818e893d8691d2ff24bf1c120d34458a00be8d12b5b74037f930b20cabcd` | `https://e26f015c7ba47a6ad6219385e77072b7.r2.cloudflarestorage.com` | +| `bucket` | `acb-data` | `acb-data` ✓ | +| `access-key` | `66aabf3cc401c74755910422a903a8af` | (R2 Access Key ID - 32 chars) | +| `secret-key` | `https://e26f015c7ba47a6ad6219385e77072b7.r2.cloudflarestorage.com` | (R2 Secret Access Key - 64 chars) | + +## Root Cause + +The values stored in OpenBao at `secret/rs-manager/ai-code-battle/r2` are corrupted: +- The `endpoint` property contains a SHA256 hash +- The `secret-key` property contains the actual endpoint URL +- The `access-key` property contains what looks like a hash instead of the R2 access key ID + +This is **not an ESO sync issue** - ESO is correctly syncing whatever values are stored in OpenBao. + +## Impact + +All R2 operations fail with "Custom endpoint was not a valid URI": +- Replay uploads to R2 fail (index-builder, worker) +- Thumbnail uploads to R2 fail +- Bot card uploads to R2 fail +- Website replay viewer cannot load real matches + +## Fix Options + +### Option 1: Fix the OpenBao Secret (Recommended) + +1. Access OpenBao on rs-manager +2. Update the secret at `secret/rs-manager/ai-code-battle/r2` with correct values: + ```bash + # Via OpenBao UI or CLI + vault kv put secret/rs-manager/ai-code-battle/r2 \ + endpoint="https://e26f015c7ba47a6ad6219385e77072b7.r2.cloudflarestorage.com" \ + bucket="acb-data" \ + access-key="" \ + secret-key="" + ``` +3. Force ESO to re-sync: + ```bash + kubectl --kubeconfig=/home/coding/.kube/iad-acb.kubeconfig annotate externalsecret acb-r2-credentials -n ai-code-battle force-sync=$(date +%s) + ``` + +### Option 2: Replace with SealedSecret (Bypass ESO) + +1. Generate R2 API credentials in Cloudflare dashboard (R2 > API Tokens) +2. Create SealedSecret with correct values: + ```bash + kubectl create secret generic acb-r2-credentials -n ai-code-battle \ + --from-literal=endpoint="https://e26f015c7ba47a6ad6219385e77072b7.r2.cloudflarestorage.com" \ + --from-literal=bucket="acb-data" \ + --from-literal=access-key="" \ + --from-literal=secret-key="" \ + --dry-run=client -o yaml | \ + kubeseal --controller-name=sealed-secrets -n ai-code-battle + ``` +3. Remove ExternalSecret from declarative-config +4. Commit SealedSecret to declarative-config + +### Option 3: Fix Script (Automated Option 1) + +Run `/home/coding/ai-code-battle/fix-iad-acb-r2-credentials.sh` with: +- OpenBao root token OR +- R2 credentials (will update OpenBao directly) + +## Required R2 Credentials + +To fix this, you need: +1. **R2 Access Key ID** (32 characters, starts with digits like `1234567890abcdef...`) +2. **R2 Secret Access Key** (64 characters, base64-like) + +Get these from Cloudflare Dashboard: +1. Go to: R2 > acb-data > Settings > R2 API +2. Click "Create API Token" or use existing token +3. Copy Access Key ID and Secret Access Key + +## Verification + +After fix, verify: +```bash +# Check secret values +kubectl --kubeconfig=/home/coding/.kube/iad-acb.kubeconfig get secret acb-r2-credentials -n ai-code-battle -o json | jq -r '.data | map_values(@base64d)' + +# Check index-builder pod can start +kubectl --kubeconfig=/home/coding/.kube/iad-acb.kubeconfig get pods -n ai-code-battle -l app.kubernetes.io/name=acb-index-builder + +# Check logs for R2 errors +kubectl --kubeconfig=/home/coding/.kube/iad-acb.kubeconfig logs -n ai-code-battle -l app.kubernetes.io/name=acb-index-builder --tail=50 +``` + +## Files Modified + +- `/home/coding/ai-code-battle/fix-iad-acb-r2-credentials.sh` - Fix script (to be created) +- `/home/coding/ai-code-battle/IAD-ACB-R2-CREDENTIALS-FIX.md` - This document diff --git a/acb-map-evolver b/acb-map-evolver new file mode 100755 index 0000000..d8b9c7b Binary files /dev/null and b/acb-map-evolver differ diff --git a/cmd/acb-worker/db.go b/cmd/acb-worker/db.go index 672ac83..c7b3c0a 100644 --- a/cmd/acb-worker/db.go +++ b/cmd/acb-worker/db.go @@ -652,3 +652,40 @@ func updateSeriesResult(ctx context.Context, tx *sql.Tx, matchID string, winnerB log.Printf("series: game %d result recorded — series %d, winner=%s", gameNum, seriesID, winnerBotID) return nil } + +// UpdateMapEngagement updates the engagement score for a map using a rolling average. +// The new engagement score is computed as: (old_engagement * match_count + new_engagement) / (match_count + 1) +func (c *DBClient) UpdateMapEngagement(ctx context.Context, mapID string, engagementScore float64, turns int) error { + // Use a transaction to safely read and update the engagement score + tx, err := c.db.BeginTx(ctx, nil) + if err != nil { + return fmt.Errorf("failed to begin transaction: %w", err) + } + defer tx.Rollback() + + // Get current engagement and match count + var currentEngagement float64 + var matchCount int + err = tx.QueryRowContext(ctx, ` + SELECT COALESCE(engagement, 0.0), COALESCE(match_count, 0) + FROM maps WHERE map_id = $1 + `, mapID).Scan(¤tEngagement, &matchCount) + if err != nil { + return fmt.Errorf("failed to get current map stats: %w", err) + } + + // Compute rolling average + newEngagement := (currentEngagement*float64(matchCount) + engagementScore) / float64(matchCount+1) + + // Update engagement and match count + _, err = tx.ExecContext(ctx, ` + UPDATE maps + SET engagement = $1, match_count = match_count + 1, last_used_at = NOW() + WHERE map_id = $2 + `, newEngagement, mapID) + if err != nil { + return fmt.Errorf("failed to update map engagement: %w", err) + } + + return tx.Commit() +} diff --git a/cmd/acb-worker/main.go b/cmd/acb-worker/main.go index 2db7acc..ada6601 100644 --- a/cmd/acb-worker/main.go +++ b/cmd/acb-worker/main.go @@ -392,6 +392,17 @@ func (w *Worker) executeMatch(ctx context.Context, claimData *JobClaimData) (*Ma // Compute combat_turns: count distinct turns where ≥1 bot died from "combat" (enemy kill) result.CombatTurns = computeCombatTurns(replay) + // Calculate map engagement score from replay + engagement := engine.CalculateMapEngagement(replay) + w.logger.Printf("Map engagement: crossings=%.0f, critical_moments=%d, coverage=%.2f%%, closeness=%.2f, score=%.2f", + engagement.WinProbCrossings, engagement.CriticalMoments, engagement.MapCoveragePct*100, engagement.Closeness, engagement.Engagement) + + // Update map engagement in database + if err := w.db.UpdateMapEngagement(ctx, claimData.Match.MapID, engagement.Engagement, result.Turns); err != nil { + // Log but don't fail the match — map engagement is non-critical + w.logger.Printf("Warning: failed to update map engagement: %v", err) + } + return result, replay, nil } diff --git a/engine/integration_test.go b/engine/integration_test.go index 3b95d64..6d2f646 100644 --- a/engine/integration_test.go +++ b/engine/integration_test.go @@ -68,6 +68,31 @@ func TestIntegration_HTTPMatch(t *testing.T) { } t.Logf("Match completed: Winner=%d, Turns=%d", result.Winner, result.Turns) + + // Verify win_prob array is populated (task: bf-qps) + if len(replay.WinProb) == 0 { + t.Error("Replay WinProb array is empty - ComputeWinProbability was not called") + } + + // Verify WinProb entries have correct length (should equal number of players) + if len(replay.WinProb) > 0 && len(replay.WinProb[0]) != len(replay.Players) { + t.Errorf("WinProb entries have %d values, want %d (number of players)", len(replay.WinProb[0]), len(replay.Players)) + } + + // Verify WinProb values are in valid range [0, 1] + for i, entry := range replay.WinProb { + for j, prob := range entry { + if prob < 0 || prob > 1 { + t.Errorf("WinProb entry %d player %d has invalid probability %.2f (want 0-1)", i, j, prob) + } + } + } + + // Verify critical moments are populated + t.Logf("Critical moments detected: %d", len(replay.CriticalMoments)) + for _, m := range replay.CriticalMoments { + t.Logf(" Turn %d: delta=%.2f, player=%d, desc=%s", m.Turn, m.Delta, m.Player, m.Description) + } } // TestIntegration_HMACAuthentication verifies HMAC signing works end-to-end. diff --git a/engine/map_engagement.go b/engine/map_engagement.go new file mode 100644 index 0000000..950eceb --- /dev/null +++ b/engine/map_engagement.go @@ -0,0 +1,180 @@ +package engine + +import "math" + +// MapEngagementScore represents the engagement metrics for a map from a single match. +type MapEngagementScore struct { + WinProbCrossings float64 // Number of times win prob crossed 50% + CriticalMoments int // Count of critical moments + MapCoveragePct float64 // Percentage of map tiles visited + Closeness float64 // 1.0 - (score_diff / max_possible_score) + TurnPct float64 // Actual turns / max_turns + Engagement float64 // Combined engagement score +} + +// CalculateMapEngagement computes the engagement score for a map based on replay data. +// The engagement formula is: +// engagement = win_prob_crossings * 3.0 + critical_moments * 2.0 + map_coverage_pct * 1.0 + closeness * 2.0 + turn_pct * 1.0 +func CalculateMapEngagement(replay *Replay) MapEngagementScore { + if replay == nil || len(replay.Turns) == 0 { + return MapEngagementScore{} + } + + // Count win probability crossings (times the leader changed) + winProbCrossings := countWinProbCrossings(replay.WinProb) + + // Count critical moments + criticalMoments := len(replay.CriticalMoments) + + // Calculate map coverage (percentage of unique tiles visited) + mapCoveragePct := calculateMapCoverage(replay) + + // Calculate closeness (how close the final score was) + closeness := calculateCloseness(replay) + + // Calculate turn percentage + turnPct := float64(replay.Result.Turns) / float64(replay.Config.MaxTurns) + + // Calculate combined engagement score + engagement := float64(winProbCrossings)*3.0 + + float64(criticalMoments)*2.0 + + mapCoveragePct*1.0 + + closeness*2.0 + + turnPct*1.0 + + return MapEngagementScore{ + WinProbCrossings: winProbCrossings, + CriticalMoments: criticalMoments, + MapCoveragePct: mapCoveragePct, + Closeness: closeness, + TurnPct: turnPct, + Engagement: engagement, + } +} + +// countWinProbCrossings counts how many times the win probability crossed 50% for any player. +// This indicates lead changes and momentum shifts. +func countWinProbCrossings(winProbs []WinProbEntry) float64 { + if len(winProbs) < 2 { + return 0 + } + + crossings := 0 + + // Track which player was leading (had highest win prob) at each turn + for i := 1; i < len(winProbs); i++ { + prevLeader := findLeader(winProbs[i-1]) + currLeader := findLeader(winProbs[i]) + + if prevLeader != currLeader { + crossings++ + } + } + + return float64(crossings) +} + +// findLeader returns the index of the player with the highest win probability. +// Returns -1 if there's a tie or no clear leader. +func findLeader(entry WinProbEntry) int { + if len(entry) == 0 { + return -1 + } + + maxProb := entry[0] + leaderIdx := 0 + + // Check if there's a clear leader (no ties) + for i := 1; i < len(entry); i++ { + if entry[i] > maxProb { + maxProb = entry[i] + leaderIdx = i + } + } + + // Verify the leader is significantly ahead (not a tie) + isTie := false + for i := 0; i < len(entry); i++ { + if i != leaderIdx && math.Abs(entry[i]-maxProb) < 0.01 { + isTie = true + break + } + } + + if isTie { + return -1 + } + + return leaderIdx +} + +// calculateMapCoverage computes the percentage of map tiles that were visited by any bot. +func calculateMapCoverage(replay *Replay) float64 { + if replay == nil || len(replay.Turns) == 0 { + return 0 + } + + totalTiles := replay.Config.Rows * replay.Config.Cols + if totalTiles == 0 { + return 0 + } + + // Count unique tiles visited across all turns + visited := make(map[string]struct{}) + for _, turn := range replay.Turns { + for _, bot := range turn.Bots { + if bot.Alive { + key := string(rune(bot.Position.Row)) + "," + string(rune(bot.Position.Col)) + visited[key] = struct{}{} + } + } + } + + // Subtract wall tiles from total (they're not visitable) + wallTiles := len(replay.Map.Walls) + visitbleTiles := totalTiles - wallTiles + if visitbleTiles <= 0 { + return 0 + } + + return float64(len(visited)) / float64(visitbleTiles) +} + +// calculateCloseness computes how close the final score was. +// Returns 1.0 for a draw/tie, decreasing to 0.0 for a blowout. +func calculateCloseness(replay *Replay) float64 { + if replay == nil || replay.Result == nil || len(replay.Result.Scores) == 0 { + return 0 + } + + // Find the max and min scores + maxScore := replay.Result.Scores[0] + minScore := replay.Result.Scores[0] + for _, score := range replay.Result.Scores { + if score > maxScore { + maxScore = score + } + if score < minScore { + minScore = score + } + } + + scoreDiff := maxScore - minScore + if scoreDiff == 0 { + return 1.0 // Perfect tie + } + + // Normalize: closeness = 1 - (score_diff / max_possible_score) + // Assume max possible score is roughly 3x the number of turns (3 points per capture) + maxPossibleScore := float64(replay.Config.MaxTurns) * 3.0 + if maxPossibleScore <= 0 { + return 1.0 + } + + normalizedDiff := float64(scoreDiff) / maxPossibleScore + if normalizedDiff > 1.0 { + normalizedDiff = 1.0 + } + + return 1.0 - normalizedDiff +} diff --git a/fix-iad-acb-openbao.sh b/fix-iad-acb-openbao.sh new file mode 100755 index 0000000..6edd0b5 --- /dev/null +++ b/fix-iad-acb-openbao.sh @@ -0,0 +1,57 @@ +#!/bin/bash +# Fix script for iad-acb ClusterSecretStore issue +# Problem: Orphaned openbao namespace/service causing DNS conflicts + +set -e + +KUBECONFIG="${KUBECONFIG:-/home/coding/.kube/iad-acb.kubeconfig}" + +echo "=== Checking iad-acb cluster access ===" +if ! kubectl --kubeconfig="$KUBECONFIG" get namespace openbao >/dev/null 2>&1; then + echo "✓ No openbao namespace found - nothing to clean up!" + echo "The ClusterSecretStore should work correctly now." + exit 0 +fi + +echo "⚠️ Found openbao namespace - checking if it's managed by ArgoCD..." + +# Check if there's an ArgoCD Application for openbao in iad-acb +ARGOCD_APPS=$(kubectl --kubeconfig=/home/coding/.kube/ardenone-manager.kubeconfig \ + get applications -n argocd -o json 2>/dev/null | \ + jq -r '.items[] | select(.spec.destination.server | contains("iad-acb")) | select(.spec.destination.namespace == "openbao") | .metadata.name') + +if [ -n "$ARGOCD_APPS" ]; then + echo "⚠️ openbao namespace is managed by ArgoCD (apps: $ARGOCD_APPS)" + echo "Do NOT delete manually - update ArgoCD apps instead." + echo "Check declarative-config for iad-acb openbao resources." + exit 1 +fi + +echo "✓ openbao namespace is orphaned (not managed by ArgoCD)" +echo "" +echo "=== Orphaned openbao resources found ===" +kubectl --kubeconfig="$KUBECONFIG" get all -n openbao 2>/dev/null || echo "No resources found" +echo "" + +read -p "Delete orphaned openbao namespace? (y/N) " -n 1 -r +echo +if [[ $REPLY =~ ^[Yy]$ ]]; then + echo "Deleting openbao namespace..." + kubectl --kubeconfig="$KUBECONFIG" delete namespace openbao + echo "✓ Deleted openbao namespace" +fi + +echo "" +echo "=== Verifying ClusterSecretStore ===" +kubectl --kubeconfig="$KUBECONFIG" get clustersecretstore openbao -o yaml + +echo "" +echo "=== Checking ExternalSecrets status ===" +for es in acb-evolver-secrets acb-armor acb-docker-hub; do + echo -n "$es: " + kubectl --kubeconfig="$KUBECONFIG" get externalsecret "$es" -n ai-code-battle -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}' 2>/dev/null || echo "Not found" +done + +echo "" +echo "Done! Check acb-evolver pod status:" +kubectl --kubeconfig="$KUBECONFIG" get pods -n ai-code-battle -l app=acb-evolver diff --git a/fix-iad-acb-r2-credentials.sh b/fix-iad-acb-r2-credentials.sh new file mode 100755 index 0000000..4bc6477 --- /dev/null +++ b/fix-iad-acb-r2-credentials.sh @@ -0,0 +1,221 @@ +#!/bin/bash +# Fix script for iad-acb R2 credentials corruption +# Problem: Values in OpenBao at secret/rs-manager/ai-code-battle/r2 are swapped/corrupted +# This script updates OpenBao with correct R2 credentials + +set -e + +KUBECONFIG="${KUBECONFIG:-/home/coding/.kube/iad-acb.kubeconfig}" +NAMESPACE="ai-code-battle" +SECRET_NAME="acb-r2-credentials" + +# Default values (can be overridden via environment or prompts) +R2_ENDPOINT="${ACB_R2_ENDPOINT:-https://e26f015c7ba47a6ad6219385e77072b7.r2.cloudflarestorage.com}" +R2_BUCKET="${ACB_R2_BUCKET:-acb-data}" + +echo "=== iad-acb R2 Credentials Fix ===" +echo "" +echo "This script fixes the corrupted R2 credentials in OpenBao." +echo "" + +# Check if OpenBao is accessible +echo "Checking OpenBao connection..." +OPENBAO_ADDR="http://openbao.external-secrets.svc.cluster.local:8200" +if ! curl -s --connect-timeout 5 "$OPENBAO_ADDR/v1/sys/health" > /dev/null 2>&1; then + echo "❌ Cannot reach OpenBao at $OPENBAO_ADDR" + echo "" + echo "Options:" + echo "1. Create a SealedSecret instead (bypass OpenBao)" + echo "2. Fix OpenBao connectivity first" + echo "" + read -p "Create SealedSecret? (y/N) " -n 1 -r + echo + if [[ $REPLY =~ ^[Yy]$ ]]; then + CREATE_SEALED_SECRET=true + else + echo "Exiting. Please fix OpenBao connectivity or provide R2 credentials for SealedSecret." + exit 1 + fi +else + echo "✓ OpenBao is reachable" + CREATE_SEALED_SECRET=false +fi + +# Prompt for R2 credentials +echo "" +echo "Enter R2 credentials (from Cloudflare Dashboard > R2 > acb-data > Settings > R2 API):" +echo "" + +if [ -z "$ACB_R2_ACCESS_KEY" ]; then + read -p "R2 Access Key ID (32 chars): " ACB_R2_ACCESS_KEY +else + echo "Using ACB_R2_ACCESS_KEY from environment" +fi + +if [ -z "$ACB_R2_SECRET_KEY" ]; then + read -sp "R2 Secret Access Key (64 chars): " ACB_R2_SECRET_KEY + echo +else + echo "Using ACB_R2_SECRET_KEY from environment" +fi + +# Validate inputs +if [ ${#ACB_R2_ACCESS_KEY} -lt 20 ]; then + echo "❌ Access Key too short (expected ~32 chars)" + exit 1 +fi + +if [ ${#ACB_R2_SECRET_KEY} -lt 40 ]; then + echo "❌ Secret Key too short (expected ~64 chars)" + exit 1 +fi + +echo "" +echo "=== Configuration ===" +echo "Endpoint: $R2_ENDPOINT" +echo "Bucket: $R2_BUCKET" +echo "Access Key: ${ACB_R2_ACCESS_KEY:0:8}..." +echo "Secret Key: ${ACB_R2_SECRET_KEY:0:8}..." +echo "" + +if [ "$CREATE_SEALED_SECRET" = true ]; then + echo "=== Creating SealedSecret ===" + echo "" + echo "Creating SealedSecret to bypass ESO..." + + # Create a temporary secret file + TEMP_SECRET=$(mktemp) + cat > "$TEMP_SECRET" < /dev/null; then + echo "❌ kubeseal not found. Installing..." + # Try to install from common locations + if [ "$(uname -m)" = "x86_64" ]; then + KUBESEAL_VERSION="0.24.0" + wget -q "https://github.com/bitnami-labs/sealed-secrets/releases/download/v${KUBESEAL_VERSION}/kubeseal-${KUBESEAL_VERSION}-linux-amd64.tar.gz" -O /tmp/kubeseal.tar.gz + tar -xzf /tmp/kubeseal.tar.gz -C /tmp kubeseal + sudo install -m 755 /tmp/kubeseal /usr/local/bin/kubeseal + rm /tmp/kubeseal.tar.gz /tmp/kubeseal + else + echo "Please install kubeseal manually:" + echo " https://github.com/bitnami-labs/sealed-secrets/releases" + exit 1 + fi + fi + + SEALED_SECRET=$(kubeseal --format=yaml < "$TEMP_SECRET") + rm "$TEMP_SECRET" + + echo "" + echo "=== SealedSecret Generated ===" + echo "" + echo "$SEALED_SECRET" + echo "" + echo "Apply this SealedSecret to the cluster:" + echo " echo '$SEALED_SECRET' | kubectl --kubeconfig=$KUBECONFIG apply -f -" + echo "" + echo "Then remove the ExternalSecret from declarative-config:" + echo " rm /home/coding/declarative-config/k8s/iad-acb/ai-code-battle/acb-r2-credentials-externalsecret.yml" + +else + echo "=== Updating OpenBao Secret ===" + echo "" + echo "The script needs OpenBao admin access to update the secret." + echo "" + echo "Option A: Provide OpenBao root token" + read -sp "OpenBao root token (leave empty to skip): " OPENBAO_TOKEN + echo + + if [ -n "$OPENBAO_TOKEN" ]; then + echo "Updating OpenBao secret at: secret/rs-manager/ai-code-battle/r2" + + # Use kubectl exec to access OpenBao + OPENBAO_POD=$(kubectl --kubeconfig="$KUBECONFIG" get pods -n openbao -l app.kubernetes.io/name=openbao -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "") + + if [ -z "$OPENBAO_POD" ]; then + echo "❌ Cannot find OpenBao pod in openbao namespace" + echo "Trying direct API access..." + + # Try direct API access (requires network reachability) + curl -s -X POST "$OPENBAO_ADDR/v1/auth/token/create" \ + -H "X-Vault-Token: $OPENBAO_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"policies": ["root"]}' > /dev/null 2>&1 || { + echo "❌ Cannot authenticate with OpenBao" + exit 1 + } + fi + + # Update the secret via API + RESPONSE=$(curl -s -X POST "$OPENBAO_ADDR/v1/secret/data/rs-manager/ai-code-battle/r2" \ + -H "X-Vault-Token: $OPENBAO_TOKEN" \ + -H "Content-Type: application/json" \ + -d "{ + \"data\": { + \"endpoint\": \"$R2_ENDPOINT\", + \"bucket\": \"$R2_BUCKET\", + \"access-key\": \"$ACB_R2_ACCESS_KEY\", + \"secret-key\": \"$ACB_R2_SECRET_KEY\" + } + }") + + if echo "$RESPONSE" | jq -e '.errors' > /dev/null 2>&1; then + echo "❌ Failed to update OpenBao secret:" + echo "$RESPONSE" | jq -r '.errors[]' + exit 1 + else + echo "✓ OpenBao secret updated successfully" + fi + + # Force ESO to re-sync + echo "Forcing ESO to re-sync..." + kubectl --kubeconfig="$KUBECONFIG" annotate externalsecret $SECRET_NAME -n $NAMESPACE force-sync=$(date +%s) --overwrite + + echo "✓ ExternalSecret annotation added" + else + echo "" + echo "=== Option B: Manual OpenBao Update ===" + echo "" + echo "Update the secret manually in OpenBao:" + echo "" + echo " vault login " + echo " vault kv put secret/rs-manager/ai-code-battle/r2 \\" + echo " endpoint=\"$R2_ENDPOINT\" \\" + echo " bucket=\"$R2_BUCKET\" \\" + echo " access-key=\"$ACB_R2_ACCESS_KEY\" \\" + echo " secret-key=\"$ACB_R2_SECRET_KEY\"" + echo "" + echo "Then force ESO re-sync:" + echo " kubectl --kubeconfig=$KUBECONFIG annotate externalsecret $SECRET_NAME -n $NAMESPACE force-sync=\$(date +%s)" + fi +fi + +echo "" +echo "=== Verification ===" +echo "" +echo "After applying the fix, verify the secret:" +echo " kubectl --kubeconfig=$KUBECONFIG get secret $SECRET_NAME -n $NAMESPACE -o json | jq -r '.data | map_values(@base64d)'" +echo "" +echo "Expected values:" +echo " endpoint: $R2_ENDPOINT" +echo " bucket: $R2_BUCKET" +echo " access-key: $ACB_R2_ACCESS_KEY" +echo " secret-key: <64-char secret key>" +echo ""