FABRIC/notes/bf-27e4.md
jedarden b7dc765f48 docs(bf-27e4): document fix for beadsCompleted vs stuck detection metric
The fix is already in place from previous commits (47c3396, c047131).
This commit documents the solution for future reference.

The stuck detection now correctly distinguishes between:
- beadsCompleted: all beads processed (including timed-out/deferred)
- beadsSucceeded: successful completions only
- beadsTimedOut: timed-out/deferred beads

Stuck reason text now clearly shows metrics:
'100 processed but 0 successful completions (all timed out/deferred)'

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:19:16 -04:00

2.2 KiB

Fix for beadsCompleted vs Stuck Detection Metric Discrepancy

Problem

The /api/workers API returned confusing data:

  • beadsCompleted: 285 (counting bead.released events including timed-out/deferred)
  • stuck: true, stuckReason: 'Running for 2311m with only 1 completion(s)'

This created a confusing impression: 285 completions but "only 1 completion"?

Root Cause

The stuck detection was using a different metric than what was displayed:

  • beadsCompleted counted all bead.released events (including timed-out/deferred)
  • The stuck detection counted successful completions (bead.completed events only)

When all beads timed out or were deferred, beadsCompleted would increment but the stuck detector would see zero successful completions.

Solution

Three metrics were unified in the WorkerInfo type:

  1. beadsCompleted - All beads processed (bead.released events with release_success)

    • Includes timed-out and deferred beads
  2. beadsSucceeded - Successful completions only (bead.completed events)

    • Excludes timed-out/deferred releases
  3. beadsTimedOut - Timed-out or deferred beads (subset of beadsCompleted)

    • Tracked separately for clarity

Stuck Detection Update

The detectLongRunning function in stuckDetection.ts now:

  1. Uses beadsSucceeded for the stuck threshold (not beadsCompleted)
  2. Generates clear reason text distinguishing metrics:
    • "Running for 40m with 100 processed but 0 successful completions (all timed out/deferred)"
    • "Running for 30m with 50 processed but only 1 successful completion(s) (49 timed out/deferred)"
  3. Shows evidence array with all three metrics

Acceptance Criteria Met

Worker processing 100 timed-out beads shows clearly:

  • beadsCompleted: 100
  • beadsSucceeded: 0
  • stuckReason: "100 processed but 0 successful completions (all timed out/deferred)"

Files Modified

  • src/types.ts - Added beadsTimedOut field to WorkerInfo
  • src/store.ts - Increment beadsTimedOut on bead.released with TimedOut/Deferred outcome
  • src/tui/utils/stuckDetection.ts - Updated stuck detection to use unified metrics with clear messaging
  • src/tui/utils/stuckDetection.test.ts - Added tests for the new behavior