- Code fixes completed and committed (b35a2aa,1b399a1,7e9d1af) - Pod currently Pending due to cluster capacity (not CrashLoopBackOff) - Additional fixes in HEAD not yet deployed - Verification blocked by cluster resource constraints
2.4 KiB
2.4 KiB
acb-index-builder OOMKill Fix Task Summary
Task
Fix acb-index-builder CrashLoopBackOff (silent crash after web asset copy)
Root Cause Identified
OOMKill caused by N+1 query problems and unbounded database queries:
- fetchBots N+1 query loop: 10,000+ separate database calls for bot match stats
- fetchSeries N+1 query loop: 1000+ separate queries for series games
- fetchChampionshipBracket N+1 query loop: 500+ separate queries for championship games
- Unbounded queries: Multiple queries without LIMIT clauses
Fixes Applied (committed to codebase)
Commit b35a2aa (DEPLOYED)
- Fixed N+1 query loop in fetchBots
- Single batch query for bot match stats
- Added LIMIT 20000
Commits 1b399a1, 7e9d1af (code fixed, NOT deployed)
- Fixed N+1 query loops in fetchSeries and fetchChampionshipBracket
- Batch queries replacing per-item loops
- Reduced LIMITs across all queries:
- fetchRatingHistory: LIMIT 5000
- fetchSeries: LIMIT 1000
- fetchSeasons: LIMIT 100
- fetchPredictions: LIMIT 1000
- fetchMaps: LIMIT 1000
- series games batch: LIMIT 10000
- championship games batch: LIMIT 500
- pair frequency: LIMIT 1000
main.go panic recovery (lines 165-172)
- Defer recover() catches panics and logs via slog
- Prevents silent crashes where stderr is lost
Current Status
Deployment State
- Deployed image: ronaldraygun/acb-index-builder:b35a2aa
- Code HEAD:
96d7fb8(includes ALL fixes) - Gap: Additional fixes in HEAD not yet deployed
Cluster Status
- Pod: acb-index-builder-7fc99df58b-5zjpp
- Status: Pending (not CrashLoopBackOff)
- Reason: Cluster overcommitted (94% memory, 98% CPU)
- Blocker: Cannot free resources or deploy new image with read-only access
Acceptance Criteria Status
| Criteria | Status |
|---|---|
| acb-index-builder runs through 2+ build cycles | ⏳ Blocked (cluster capacity) |
| "Build cycle completed" in logs | ⏳ Blocked (pod Pending) |
| No CrashLoopBackOff | ✅ Not applicable (pod Pending) |
Conclusion
Code fixes: ✅ Complete and committed Deployment: ⏳ Partial (only first fix deployed) Verification: ⏳ Blocked (cluster capacity constraints)
The root cause has been identified and fixed in the codebase. Full deployment and verification require:
- Building new image with HEAD (
96d7fb8) - Freeing cluster resources or scaling cluster
- Deploying and monitoring pod for 2+ build cycles