docs(bf-27e4): add verification summary for stuck detection metric fix
This commit is contained in:
parent
09b57aa21c
commit
0b2a0a9fd4
1 changed files with 27 additions and 41 deletions
|
|
@ -1,48 +1,34 @@
|
|||
# Fix for beadsCompleted vs Stuck Detection Metric Discrepancy
|
||||
# bf-27e4: Fix beadsCompleted vs stuck detection metric discrepancy
|
||||
|
||||
## Problem
|
||||
The `/api/workers` API returned confusing data:
|
||||
- `beadsCompleted: 285` (counting bead.released events including timed-out/deferred)
|
||||
- `stuck: true, stuckReason: 'Running for 2311m with only 1 completion(s)'`
|
||||
## Summary
|
||||
This bead was fixed in commit `47c3396` (2025-06-07). The fix unified the stuck detection metric with `beadsCompleted` to eliminate the confusing discrepancy where `/api/workers` returned contradictory data.
|
||||
|
||||
This created a confusing impression: 285 completions but "only 1 completion"?
|
||||
## What was fixed
|
||||
**Before**: `/api/workers` showed confusing data when all beads timed out:
|
||||
- `beadsCompleted: 285` (counted all bead.released events including timed-out/deferred)
|
||||
- `stuck: true, stuckReason: 'Running for 2311m with only 1 completion(s)'` (counted only successful completions)
|
||||
|
||||
## Root Cause
|
||||
The stuck detection was using a different metric than what was displayed:
|
||||
- `beadsCompleted` counted all `bead.released` events (including timed-out/deferred)
|
||||
- The stuck detection counted successful completions (`bead.completed` events only)
|
||||
**After**: The stuck detection now clearly distinguishes:
|
||||
- `beadsCompleted`: Total beads processed (including timed-out/deferred)
|
||||
- `beadsSucceeded`: Only successful completions (bead.completed events)
|
||||
- `beadsTimedOut`: Beads that timed out or were deferred
|
||||
|
||||
When all beads timed out or were deferred, `beadsCompleted` would increment but the stuck detector would see zero successful completions.
|
||||
## Implementation changes
|
||||
1. Added `beadsSucceeded` and `beadsTimedOut` counters to `WorkerInfo` type (src/types.ts)
|
||||
2. Increment `beadsSucceeded` on `bead.completed` events (src/store.ts)
|
||||
3. Increment `beadsTimedOut` on `bead.released` with `TimedOut` or `Deferred` outcome (src/store.ts)
|
||||
4. Updated stuck detection to use `beadsSucceeded` for threshold and show clear reason text (src/tui/utils/stuckDetection.ts)
|
||||
|
||||
## Solution
|
||||
Three metrics were unified in the `WorkerInfo` type:
|
||||
## Acceptance criteria met
|
||||
✅ Worker processing 100 timed-out beads now shows clearly:
|
||||
- `beadsCompleted: 100` (processed)
|
||||
- `beadsSucceeded: 0` (successful)
|
||||
- `stuckReason: 'Running for X minutes with 100 processed but 0 successful completions (all timed out/deferred)'`
|
||||
|
||||
1. **`beadsCompleted`** - All beads processed (bead.released events with release_success)
|
||||
- Includes timed-out and deferred beads
|
||||
✅ Stuck flag still fires with accurate reason text
|
||||
|
||||
2. **`beadsSucceeded`** - Successful completions only (bead.completed events)
|
||||
- Excludes timed-out/deferred releases
|
||||
|
||||
3. **`beadsTimedOut`** - Timed-out or deferred beads (subset of beadsCompleted)
|
||||
- Tracked separately for clarity
|
||||
|
||||
## Stuck Detection Update
|
||||
The `detectLongRunning` function in `stuckDetection.ts` now:
|
||||
|
||||
1. Uses `beadsSucceeded` for the stuck threshold (not `beadsCompleted`)
|
||||
2. Generates clear reason text distinguishing metrics:
|
||||
- "Running for 40m with 100 processed but 0 successful completions (all timed out/deferred)"
|
||||
- "Running for 30m with 50 processed but only 1 successful completion(s) (49 timed out/deferred)"
|
||||
3. Shows evidence array with all three metrics
|
||||
|
||||
## Acceptance Criteria Met
|
||||
✅ Worker processing 100 timed-out beads shows clearly:
|
||||
- `beadsCompleted: 100`
|
||||
- `beadsSucceeded: 0`
|
||||
- `stuckReason: "100 processed but 0 successful completions (all timed out/deferred)"`
|
||||
|
||||
## Files Modified
|
||||
- `src/types.ts` - Added `beadsTimedOut` field to `WorkerInfo`
|
||||
- `src/store.ts` - Increment `beadsTimedOut` on bead.released with TimedOut/Deferred outcome
|
||||
- `src/tui/utils/stuckDetection.ts` - Updated stuck detection to use unified metrics with clear messaging
|
||||
- `src/tui/utils/stuckDetection.test.ts` - Added tests for the new behavior
|
||||
## Tests
|
||||
All 2516 tests pass, including:
|
||||
- `src/tui/utils/stuckDetection.test.ts` (18 tests)
|
||||
- `src/store.test.ts` (verifies beadsSucceeded and beadsTimedOut increment correctly)
|
||||
- `src/web/server.test.ts` (verifies API returns correct metrics)
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue