Commit graph

30 commits

Author SHA1 Message Date
jedarden
81b57e66b5 refactor(bf-53q6): add SystemMemoryIndicator to fleet header and clean up cgroup monitor
Some checks are pending
CI / test (18.x) (push) Waiting to run
CI / test (20.x) (push) Waiting to run
CI / test (22.x) (push) Waiting to run
- Add SystemMemoryIndicator component showing sparkline and usage in fleet header
- Refactor systemCgroupMonitor.ts for cleaner implementation
- Update index.css with fleet-header layout styles
- Add fleet-header with separator between FleetSummaryBar and SystemMemoryIndicator

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:14:45 -04:00
jedarden
83baf06edd feat(bf-53q6): integrate SystemMemoryPanel into FABRIC web dashboard
- Add SystemMemoryPanel rendering in App.tsx main content area
- Add 'show:memory' command palette action for opening memory panel
- Fix import of SystemMemoryPanel (named export)
- Backend features already in place: /api/system/memory, /api/system/memory/history, OOM tracking, 5-min sparkline

This completes the integration of the system cgroup memory panel that shows:
- Current cgroup memory usage vs MemoryHigh (color-coded progress bar)
- 5-minute sparkline of memory usage sampled every 10s
- oom_kill counter from /sys/fs/cgroup/user.slice/memory.events
- Swap usage when enabled

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:06:21 -04:00
jedarden
77b1cd72c3 feat(bf-5cdj): sample per-worker process RSS from /proc and expose via API
Some checks are pending
CI / test (18.x) (push) Waiting to run
CI / test (20.x) (push) Waiting to run
CI / test (22.x) (push) Waiting to run
Add MemorySampler that polls active worker PIDs every 10s to sample
/proc/<pid>/status for VmRSS, VmPeak, and VmSwap memory metrics.

Changes:
- Add MemorySampler class with periodic sampling (10s interval)
- Attach rssKb, peakRssKb, swapKb to WorkerState in types.ts
- Integrate with InMemoryEventStore to register PIDs from events
- Expose memory fields on GET /api/workers response
- Broadcast updated memory fields via WebSocket
- Add comprehensive test suite

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 09:58:04 -04:00
jedarden
87af357907 feat(bf-4a5b): complete resource consumption management
Some checks are pending
CI / test (18.x) (push) Waiting to run
CI / test (20.x) (push) Waiting to run
CI / test (22.x) (push) Waiting to run
Phase 1: Infra hardening
- Per-worker MemoryMax ceiling (4 GB) via workerMemoryLimiter

Phase 2: FABRIC visibility
- System cgroup monitoring (systemCgroupMonitor.ts)
  - Tracks user.slice cgroup memory usage/limit/high/swap
  - OOM risk detection (none/low/medium/high/critical)
  - System memory stats from /proc/meminfo
- Per-worker RSS tracking in WorkerInfo (throttled to every 200 events)
- System Memory Panel UI component
  - Real-time cgroup/system/swap/FABRIC memory display
  - OOM risk banner with color-coded alerts
  - 5-second polling refresh
- API endpoints: /api/system/memory, /api/alerts/oom
- UI toggle button in header

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 09:34:59 -04:00
jedarden
7df43a353b feat(web): add /api/spans/dag endpoint for OTLP span visualization
Some checks are pending
CI / test (18.x) (push) Waiting to run
CI / test (20.x) (push) Waiting to run
CI / test (22.x) (push) Waiting to run
Implements the missing /api/spans/dag endpoint that was blocking the
SpanDag component. The endpoint queries span events from the store and
builds a hierarchical tree structure for visualization.

Changes:
- Added GET /api/spans/dag endpoint in src/web/server.ts
- Added SpanDagResponse interface to src/types.ts for JSON serialization
- Updated SpanNode interface to use nullable fields (null instead of undefined)
- Fixed src/dagUtils.ts to use nullable SpanNode fields

The endpoint accepts an optional trace_id query parameter to filter
spans by trace, and returns a SpanDagResponse with root spans, total
span count, and trace summary.

Closes: bf-82u8
2026-05-26 19:41:13 -04:00
jedarden
600b114b91 feat(web): add Worker Comparison Analytics panel and API
Some checks are pending
CI / test (18.x) (push) Waiting to run
CI / test (20.x) (push) Waiting to run
CI / test (22.x) (push) Waiting to run
Backend API endpoints (src/web/server.ts):
- GET /api/workers/compare?worker1=&worker2= — returns WorkerComparison via analyticsManager.compareWorkers()
- GET /api/analytics/workers — returns per-worker WorkerMetrics for leaderboard table
- GET /api/analytics/sessions — exposes historicalStore.getSessions() for cross-session comparisons

Frontend component (src/web/frontend/src/components/WorkerAnalyticsPanel.tsx):
- Comparison view mirroring TUI WorkerAnalyticsPanel behavior
- Leaderboard table with sortable columns
- Historical sessions list
- Worker selection for comparison with diff/percent/winner indicators

Wired into App.tsx with new "Workers" button (⚔️ icon) and command palette action (show:worker-analytics)

Closes: bf-4cqq

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 17:29:26 -04:00
jedarden
55611370bb feat(web): add Historical Session Index API and browser UI
Some checks are pending
CI / test (18.x) (push) Waiting to run
CI / test (20.x) (push) Waiting to run
CI / test (22.x) (push) Waiting to run
Implements Phase 6 Historical session index for comparisons.

Backend (src/web/server.ts):
- GET /api/sessions — list sessions (paginated, with start/end filter)
- GET /api/sessions/:id — get single session detail with per-worker summaries
- GET /api/sessions/:id/workers — get worker summaries for a session
All endpoints use the existing HistoricalStore infrastructure.

Frontend (src/web/frontend/src/components/HistoricalSessionsPanel.tsx):
- Sessions list table with duration, workers, tasks, cost, tokens, time range
- Click-to-expand session detail with worker performance breakdown
- Metrics source badge (otlp-metric, otlp-span, log-derived)
- Empty state with helpful hint when no sessions exist
- Refresh button for manual reload

Integration:
- Added to App.tsx with Sessions toggle button in header
- Command palette action: show:sessions
- Follows existing panel patterns (ProductivityPanel, AnalyticsDashboard)

Closes: bf-5xch

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 17:19:58 -04:00
jedarden
5e029c142c feat(bf-3xp): add bead workspace scanner + project breakdown in /api/productivity
Some checks are pending
CI / test (18.x) (push) Waiting to run
CI / test (20.x) (push) Waiting to run
CI / test (22.x) (push) Waiting to run
Phase 9 implementation: Bead workspace scanner and project breakdown.

- Add beadWorkspaceScanner.ts to scan .beads/issues.jsonl files
- Count CLOSED beads per project, deriving project from bead id prefix
- Use close_reason/closed_at/assignee for productivity tracking
- Add configurable workspace list in config.ts (WorkspaceConfig interface)
- Extend GET /api/productivity to add byProject array
- Add By Project section to ProductivityPanel React component
- Add tests for bead workspace scanner

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 15:34:20 -04:00
jedarden
93b3e9e038 feat(bf-6bx7): add /api/productivity endpoint and Productivity panel
Some checks are pending
CI / test (18.x) (push) Waiting to run
CI / test (20.x) (push) Waiting to run
CI / test (22.x) (push) Waiting to run
Adds GET /api/productivity returning daily bead completion counts (last 30
days) from bead.released/release_success events and a worker leaderboard
sorted by beadsCompleted. Adds a Productivity tab in the web UI with a 14-day
SVG bar chart and a worker leaderboard table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 16:58:35 -04:00
jedarden
c627976dbc feat(bf-52d6): integrate conversation transcript parser with store and web API
Integrate the conversationParser module with the InMemoryEventStore and web server
to provide complete conversation transcript functionality.

Store integration:
- Add conversation session caching with 5-second TTL
- Invalidate cache when conversation events are added
- Add methods: getConversationSessions, getWorkerConversationSessions,
  getBeadConversationSession, getConversationSession,
  getConversationEvents, getWorkerConversationEvents,
  getBeadConversationEvents

Web API integration:
- GET /api/conversations/sessions - Get all conversation sessions
- GET /api/conversations/workers/:workerId - Get sessions for worker
- GET /api/conversations/beads/:beadId - Get session for bead
- GET /api/conversations/:sessionId - Get session by ID
- GET /api/conversations/events - Get all conversation events
- GET /api/conversations/workers/:workerId/events - Get worker events
- GET /api/conversations/beads/:beadId/events - Get bead events

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-08 14:29:12 -04:00
jedarden
455da572a8 feat(retention): add systemd timer for automatic NEEDLE log pruning
Add systemd timer and service for daily log pruning at 03:00 UTC. Includes
manual prune API endpoint, setup script, and updated documentation.

## Changes
- Add `fabric-prune.service` - systemd oneshot service for log pruning
- Add `fabric-prune.timer` - daily timer (03:00 UTC) with persistent=true
- Add `POST /api/retention/prune` - manual prune trigger with auth
- Add `scripts/setup-fabric-prune.sh` - one-shot timer installer
- Update `CLAUDE.md` - document retention policy and usage

## Retention Policy
- `archiveAfterDays: 3` - files older than 3d → archive/
- `maxAgeDays: 7` - files older than 7d → delete (safety net)
- `archiveRetentionDays: 30` - archives older than 30d → delete

## Integration
- Emits `mend.logs_pruned` events to `fabric-mend.jsonl`
- FABRIC DirectoryTailer auto-discovers events
- `/api/retention` endpoint shows current state and last prune

Resolves bd-ch6.2

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 16:22:16 -04:00
jedarden
6b39dae283 feat(memory): add heap diff analysis and leak detection utilities
- Add src/heapDiff.ts: utilities for comparing heap snapshots and analyzing trends
- Add API endpoints: /api/memory/diff-analysis, /api/memory/trend, /api/memory/trend.md
- Add docs/memory-audit-bd-ch6.7.md: comprehensive audit findings

Audit findings:
- Event store well-bounded with proper cleanup (1h stale worker, 5min collision timeout)
- WebSocket broadcast has backpressure handling (1MB buffer limit)
- Parser uses native JSON.parse(), no regex issues
- Heap snapshots already configured (30min intervals, 1GB heap limit)
- No unbounded growth identified in core data structures

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 14:05:39 -04:00
jedarden
c8a6b16080 fix(cli): resolve TypeScript build error in sdNotify unix socket usage
The dgram module's unix_dgram socket type is not properly reflected in
TypeScript's SocketType types. Added @ts-expect-error directives to allow
the working runtime code to compile.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 22:25:55 -04:00
jedarden
34aee6474f feat(web): add SemanticNarrativePanel React component
Port TUI SemanticNarrativePanel to React. Provides:
- Standalone overlay panel showing narrative cards per active worker
- Phase detection (Research/Planning/Implementation/Testing/Debugging/Finalizing)
- Phase progress bar, sentiment indicator, accomplishments/challenges
- Expandable activity segments with entity details (files, tools)
- WorkerNarrativeInline component embedded in WorkerDetail narrative tab
- /api/narrative and /api/narrative/:workerId server endpoints
- CSS for all narrative UI elements
- Command palette and header button wired to show:narrative action

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 11:59:44 -04:00
jedarden
240957c8e0 feat(web): add SessionDigestPanel React component
Port src/tui/components/SessionDigest.ts to React. The panel exposes:
- 5-tab view (Summary, Beads, Files, Errors, Workers) matching TUI output
- Generate Digest button calling /api/digest (GET, no auth required)
- Export to JSON, Markdown, and plain text via browser download
- CSS styles for all digest UI classes in index.css
- Integration in App.tsx via digest-toggle header button and show:digest command

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 06:57:03 -04:00
jedarden
9938630bdd feat(web): add ErrorGroupPanel with grouped error cards and similar past errors
Port TUI ErrorGroupPanel to React — groups errors by signature with
occurrence count, affected workers, time span, severity badges, and
expandable detail cards. Links to similar past errors from fabric.db
error_history via /api/errors/history/similar endpoint.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 06:16:46 -04:00
jedarden
038cc9348d feat(bd-ch6.6): wire sd_notify + add untracked serverMetrics and health-check files
- Add src/serverMetrics.ts (ServerMetrics class for /api/health + /api/metrics)
- Add scripts/fabric-health-check.sh (curl-based liveness probe)
- Wire sd_notify READY=1 on server start and WATCHDOG=1 keepalives in server.ts
  so the Type=notify systemd service correctly reports start and keeps the
  watchdog alive without an external npm package

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 21:58:37 -04:00
jedarden
87c7888351 feat(bd-ch6.6): add /api/health + /api/metrics self-observability
- /api/health returns {status, uptime_sec, version, event_count,
  ingest_rate_per_sec, ws_clients, tailer_files_watched, dedup_dropped,
  process_resident_memory_bytes}; returns HTTP 503 with status='overloaded'
  when maxEventCount is exceeded
- /api/metrics exposes the same counters in Prometheus text format;
  fabric_status=0 when overloaded
- Add ServerMetrics.eventCount setter so both endpoints sync from store.size
  (fixes fabric_event_count in /api/metrics showing 0 when events added directly)
- Wire --max-events CLI option into `fabric web`; pass maxEventCount and
  deduplicator to createWebServer so the memory-bomb guard and dedup_dropped
  reporting are actually activated
- Track tailerFilesWatched: set after tailer.start() and update on each event
  for DirectoryTailer (uses activeFiles.length getter)
- Add import for Node net module used by systemd watchdog notify
- Add tests: overload guard returns 503, within-limit returns 200, Prometheus
  reflects fabric_status=0 when overloaded

systemd service already has Restart=on-failure + WatchdogSec=30 (scripts/fabric-web.service);
liveness guard in server.ts calls process.exit(1) after 3 consecutive overload
checks, triggering systemd restart.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 21:54:38 -04:00
jedarden
c73fe67e81 feat(bd-ch6.4): add startup warning and token rotation docs
- Warn at startup when FABRIC_AUTH_TOKEN is unset so operators know
  POST /api/events is open to any local process; surfaced before
  "Press Ctrl+C to stop" so it's visible in systemd journal
- Add "Token rotation" section to README with step-by-step procedure:
  generate new secret, update secrets.env (0600), restart service,
  verify 401 enforcement; notes that NEEDLE workers reload on next task
  start when auth_token uses \${FABRIC_AUTH_TOKEN} substitution

The full auth chain is now in place end-to-end:
  ~/.config/fabric/secrets.env (0600) → EnvironmentFile →
  FABRIC_AUTH_TOKEN env var → server auth middleware → 401/403 on
  unauthenticated POST; NEEDLE config auth_token: "\${FABRIC_AUTH_TOKEN}"
  routes worker events through the same token.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 21:31:24 -04:00
jedarden
8a4514d20a feat(bd-n8y): apply auth middleware globally to all POST routes with tests
Move auth middleware before OTLP router mount and apply it as app-level
middleware for all POST requests. This protects event ingestion endpoints
(/api/events, /api/events/batch), OTLP endpoints (/v1/logs, /v1/traces,
/v1/metrics), and cost alert acknowledgement. GET endpoints remain open.
Adds comprehensive auth tests covering 401/403/201 responses.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-23 15:59:29 -04:00
jedarden
7210fdf323 feat(bd-593): add OTLP/HTTP receiver on :4318 (protobuf + JSON)
Mount OTLP/HTTP handlers on the existing Express web server via a second
HTTP listener so OTLP endpoints are reachable at the standard :4318
address without a separate process. Accepts both application/x-protobuf
and application/json content types, routing decoded records through the
same Normalizer pipeline as the gRPC receiver.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 13:33:36 -04:00
jedarden
3f5ddb96e0 feat(bd-5ny): Add fleet analytics dashboard with model/strand/quality metrics
Parse NEEDLE worker log JSONL files to compute fleet-wide analytics:
- Model performance: beads completed, avg/median duration, distribution histogram
- Strand utilization: invocations, success rates, time spent per strand
- Completion quality: shallow detection (<10s), claim races, flagged beads
- Fleet overview: hourly time series with sparklines, workspace coverage, relaunch count

Adds /api/analytics endpoint and AnalyticsDashboard React component with
tabbed UI (Models/Strands/Quality/Fleet). No persistent DB needed — reads
logs fresh on each request.

Co-Authored-By: Claude Code (glm-5-turbo) <noreply@anthropic.com>
2026-03-20 07:19:53 -04:00
default
13e3090ca1 fix(bd-jec): Fix TypeScript build error and resolve worker starvation alert
- Fixed corrupted auth middleware code in server.ts that was causing build failure
- Verified TUI color rendering works correctly (bd-2b3)
- Resolved bd-jec: Found 8 beads in ready queue, worker starvation was due to assignee filtering

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-11 05:49:33 +00:00
default
a2e3161134 feat(bd-n8y): Add authentication/authorization to FABRIC event ingestion endpoint
- Added authToken option to WebServerOptions interface
- Created createAuthMiddleware function for Bearer token auth
- Applied auth middleware to POST /api/events and /api/events/batch
- Updated CLI to read FABRIC_AUTH_TOKEN env var
- Added comprehensive tests for authentication scenarios

- Updated POST /api/events test cases to use auth token
- All tests passing successfully
2026-03-11 05:13:12 +00:00
default
b21df31ea4 feat(bd-3ip): Add POST /api/events/batch endpoint for batched NEEDLE telemetry
- Add MAX_BATCH_SIZE constant (100 events limit)
- Implement POST /api/events/batch endpoint that accepts JSON array of events
- Validate array format, empty batches, and batch size limits
- Validate each event has required fields (ts, event)
- Store all valid events via store.add()
- Broadcast all ingested events via WebSocket
- Return 201 with ingested count, total count, and errors array
- Handle partial success (valid events processed, errors reported)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude Worker <noreply@anthropic.com>
2026-03-11 04:43:19 +00:00
default
ee90eb05a3 feat(bd-2bt): add POST /api/events endpoint to ingest NEEDLE telemetry
Add HTTP POST endpoint to receive NEEDLE telemetry events from the
fabric.sh forwarder. This bridges NEEDLE and FABRIC, enabling real-time
event ingestion via HTTP.

Changes:
- Add parseEventObject() to parser.ts for parsing JSON objects directly
- Add POST /api/events endpoint with JSON body parser (64KB limit)
- Validate required fields (ts, event) before processing
- Store events and broadcast to WebSocket clients in real-time
- Return 201 Created on success, 400 for invalid payloads

Acceptance criteria met:
- NEEDLE events sent via curl POST arrive in FABRIC's event store
- Events are broadcast to WebSocket clients in real-time
- Invalid payloads return appropriate error codes (400)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:06:55 +00:00
jeda
e1f8c570a0 feat(bd-mza): P4-002: Implement cross-reference hyperlinking
Integrated CrossReferenceManager with EventStore to enable cross-reference
hyperlinking across events, workers, files, and beads. This allows navigation
between related activities in the FABRIC dashboard.

Changes:
- Integrated CrossReferenceManager into InMemoryEventStore
- Added batch processing for cross-reference relationship detection
- Added 11 new API methods to store for cross-reference queries
- Updated web server to use store's cross-reference methods
- Added comprehensive test coverage (11 new tests)
- All 55 tests passing

Features:
- Automatic link detection between events, workers, files, and beads
- Relationship detection (same_bead, same_file, same_worker, temporal_proximity, etc.)
- Navigation path finding between entities
- Cross-reference statistics and queries
- Web API endpoints for cross-reference data

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-04 03:00:36 +00:00
jeda
ccbe8e7a36 feat(bd-1mh): Add DependencyDag component to web frontend
- Create DependencyDag.tsx with interactive task dependency visualization
- Add DAG types to web frontend types
- Add /api/dag endpoint to server.ts
- Add CSS styles for DAG panel
- Add unit tests

Co-Authored-By: Claude Worker <noreply@anthropic.com>
2026-03-03 15:13:13 +00:00
jeda
5fab75708f feat(bd-xig): Implement worker collision detection
- Add BeadCollision, TaskCollision, CollisionAlert types
- Extend WorkerInfo to track activeBead and activeDirectories
- Implement bead collision detection ( detectBeadCollision, getBeadCollisions, getWorkerBeadCollisions)
- Implement task collision detection ( detectTaskCollision, getTaskCollisions
- Implement getWorkerTaskCollisions
- Generate collision alerts with suggestions
- Add getCollisionStats for statistics
- Add cleanupStaleCollisions for bead and task collisions
- Create CollisionAlert TUI component

- Add unit tests for collision detection

🚀 Generated with Claude Worker <noreply@anthropic.com>

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-03 13:50:02 +00:00
jeda
57e8193f7b feat(bd-2kf): Add comprehensive test coverage for parser and store
- Add 36 parser tests covering:
  - parseLogLine with valid/invalid inputs
  - parseLogLines for multi-line parsing
  - formatEvent with all options
  - Edge cases: malformed JSON, missing fields, colorization

- Add 35 store tests covering:
  - InMemoryEventStore add/query operations
  - Worker status tracking (active/idle/error)
  - Event filtering by worker, level, bead, timestamp
  - maxEvents limit and LRU trimming
  - getStore/resetStore singleton management

- Close phase beads (bd-2pa, bd-n8l, bd-2nu) as infrastructure complete
- Close test beads (bd-5eh, bd-2en) with comprehensive coverage
- Total: 91 tests passing across parser, store, and tailer

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-03 10:43:24 +00:00