Enable fabric.enabled in ~/.needle/config.yaml and add README section
explaining HTTP POST /api/events as a simpler local-dev alternative to OTLP.
NEEDLE's FabricConfig already includes endpoint in src/config/mod.rs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mark all Phase 8 checklist items complete (bd-0nd.1–4 closed).
Clarify README: FABRIC watches the directory and tails every *.jsonl,
not a single workers.log file.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add e2e test proving DirectoryTailer works with real NEEDLE per-worker
JSONL format: copies anonymized fixtures to temp dir, asserts events from
multiple workers, hot-adds a gamma worker mid-test, all under 1.4s.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- resolveSource() now returns { kind: 'directory'|'file', path } instead
of appending workers.log to directories
- New resolveFromOptions() handles CLI flag precedence: --source > --file > default
- Default (no flags) now tails ~/.needle/logs/ as a directory via DirectoryTailer
- --source <nonexistent-path> prints clear error and exits
- -f/--file keeps single-file LogTailer behavior for backwards compat
- tui, web, and tail/logs commands all updated
- replay and digest commands unchanged (file-only, no --source)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New DirectoryTailer class that watches a directory for *.jsonl files,
spawning a LogTailer per file and hot-adding new files via fs.watch.
Forwards child events with source file path and deduplicates across
sources using a shared EventDeduplicator.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wire shared EventDeduplicator across all ingestion paths (JSONL tailer,
OTLP/gRPC receiver, OTLP/HTTP receiver) so duplicate events from dual
ingestion are silently dropped on (session_id, worker_id, sequence).
Also adds docs/needle-exporter-wiring.md (OTLP configuration guide for
NEEDLE), SpanDag React component, EventFilter.eventType field, and
various test/layout fixes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- getSessionWorkerSummaries now maps SQLite snake_case columns to the
TypeScript camelCase return type, fixing the type mismatch
- Fix getSessionsInRange typo (was getSingsInRange)
- Fix flaky duration test using deterministic timestamps
- Update tests to use camelCase property names
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add OTLP metric ingestion pipeline that persists Sum/Histogram/Gauge
data points to fabric.db with canonical instrument names:
- MetricAccumulator in workerAnalytics.ts accumulates per-worker
running totals for tokens, cost, durations, bead counts
- Schema v2 in historicalStore.ts adds metric_samples and
session_worker_summaries tables with source preference tracking
- flushMetricSamples() in store.ts drains accumulator → SQLite on
every event, upserting session summaries and live session rows
- docs/schema.md documents instrument names and resolution order
Fix source preference ordering (otlp-metric > otlp-span > log-derived)
using CASE-based SQL sort instead of alphabetical ASC which was
inverted. Fix metricsSource detection to not require costUsd > 0.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fix histogram data point value extraction in normalizer — OTLP histogram
points carry sum/count instead of asDouble/asInt, so needle.bead.duration
was silently dropped. Add MetricAccumulator tests and end-to-end tests
validating OTLP metrics flow through to session_worker_summaries in
fabric.db with otlp-metric source preference.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add --source flag to tui/web/tail subcommands for specifying log source
as file or directory (directories get workers.log appended). Add 'logs'
as alias for 'tail' subcommand per plan.md CLI spec. Update README.md
with fabric logs examples and --source usage.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When FABRIC ingests events from both JSONL and OTLP sources, the same
logical event can arrive twice. EventDeduplicator keeps the first
arrival and silently drops duplicates with a counter for observability.
Events with sequence < 0 (legacy formats) always pass through.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- normalizeOtlpSpanStart: emit {name}.started with span_id/parent_span_id/trace_id
- normalizeOtlpSpanEnd: emit {name}.finished with duration_ms and span attributes
- needleEventToLogEvent: promote span_id, parent_span_id, trace_id, span_name
- dagUtils: add buildSpanDag() using parent_span_id for parent-child linkage
- dagUtils: add findSpansForBead() for bead-to-span lookup
- Add integration test confirming bead lifecycle renders as DAG node with children
- Add namespaced OTLP attribute resolution (needle.worker.id etc.)
- Add OTLP body (AnyValue) extraction for logs
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace coarse NeedleWorkerStatus ('idle'|'executing'|'draining'|'starting')
with the real NEEDLE state machine: BOOTING→SELECTING→CLAIMING→WORKING→CLOSING→STOPPED.
- Add NeedleState type, VALID_TRANSITIONS map, needleStateToStatus() helper in types.ts
- Track needleState + lastStateTransition per worker by consuming worker.state_transition events
- Surface all six states in TUI worker cards (WorkerGrid, WorkerDetail) with per-state icons/colors
- Surface all six states in web WorkerGrid.tsx with NEEDLE_STATE_LABELS and NEEDLE_STATE_COLORS
- Add getNeedleStateColor/getNeedleStateIcon to colors.ts
- Rewire stuck detection to fire on state-transition gaps (state_gap pattern in stuckDetection.ts)
- Add sequence-based event ordering via compareEventsBySequence and queryOrdered()
- Legacy event-type fallback preserved for workers not emitting state_transition events
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WorkerDetail now renders statuses in uppercase (ACTIVE/IDLE/ERROR) to
match NeedleState labels. Store only recognizes canonical event types
(bead.completed, not "Task completed") for state transitions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The web frontend SessionReplay component was sorting events by
timestamp, inconsistent with all other consumers (store, TUI replay,
export, analytics, narrative) which use compareEventsBySequence.
Switched to the canonical (worker, sequence) ordering so multi-host
OTLP ingestion produces correct replay order despite clock skew.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mount OTLP/HTTP handlers on the existing Express web server via a second
HTTP listener so OTLP endpoints are reachable at the standard :4318
address without a separate process. Accepts both application/x-protobuf
and application/json content types, routing decoded records through the
same Normalizer pipeline as the gRPC receiver.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add state_gap stuck detection using lastStateTransition — fires when
a worker in an active NeedleState hasn't transitioned within the
configurable threshold (default 5min, critical at 2x).
- Update web WorkerDetail.tsx to display needleState with icons and
colors instead of coarse active/idle/error status.
- Add stuckDetection.test.ts with 16 tests covering gap-based detection,
legacy patterns, and edge cases.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add full OTLP/gRPC receiver terminating LogsService, TraceService, and
MetricsService Export RPCs. Decoded protobuf records are normalized via
the shared Normalizer pipeline and emitted as LogEvents on the event bus.
- gRPC server via @grpc/grpc-js with protobufjs codec
- Protobuf definitions for all three OTLP collector services
- enrichRecord() merges scope/resource attributes for normalizer
- extractDataPoints() handles gauge, sum, and histogram metrics
- Integration tests with real gRPC client/server round-trip
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Tighten parseNeedleEvent signature to accept string (JSON line) and
preserve all canonical fields (timestamp, event_type, worker_id,
session_id, sequence, bead_id, data)
- Make parseLogLine a thin adapter that calls parseNeedleEvent then
projects to legacy LogEvent via needleEventToLogEvent
- Add comprehensive parseNeedleEvent unit tests covering canonical format,
session_id/sequence/data round-trip, all 47 NeedleEventType values,
schema version validation, and legacy format conversion
- Rewrite parser.real-logs.integration.test.ts to assert NeedleEvent
shape against real ~/.needle/logs/*.jsonl fixtures
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Create src/normalizer.ts with normalize(raw, source): NeedleEvent supporting
all 5 sources: jsonl, otlp-log, otlp-span-start, otlp-span-end, otlp-metric
- Move JSONL parsing logic (canonical, legacy NEEDLE, flat legacy) from
parser.ts into normalizer.ts
- Refactor parser.ts to thin facade delegating to normalizer
- Tailer now imports directly from normalizer (pure source, no parse coupling)
- Add 63 unit tests covering all source paths, edge cases, and cross-source
parity between JSONL and OTLP-log
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add NeedleEvent interface as the canonical internal shape versioned at
schema-version 1. Parser validates wire format and asserts schema version
on incoming events. Legacy LogEvent retained as backward-compatible adapter.
docs/schema.md documents all fields, the (worker_id, sequence) ordering
contract, and the full event taxonomy cross-referenced with NeedleEventType.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add FABRIC-side receiver setup (--otlp-grpc, --otlp-http) with port
table covering both gRPC (4317) and HTTP (4318) protocols.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Covers the OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_PROTOCOL
env vars, the default-on otlp feature, and the local/read-only nature
of the data flow.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds Budget panel (B key) to TUI with per-bead/per-worker cost tracking,
EMA-smoothed burn rate, time-series storage, and CostDashboard web component.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add types for fleet analytics dashboard:
- DurationBucket for bead completion histograms
- ModelPerformanceMetrics with duration distribution
- StrandMetrics for strand utilization tracking
- CompletionQualityIndicator for shallow completions
- FleetAnalytics aggregating all metrics
- FleetAnalyticsOptions for query configuration
Add CSS styles for analytics dashboard:
- Stats bar with quick metrics
- Tab navigation for Models/Strands/Quality/Fleet views
- Mini bar charts for duration distributions
- Sparkline charts for worker time series
- Quality cards for shallow completion tracking
- Responsive design for mobile/tablet views
Co-Authored-By: Claude Code (glm-5) <noreply@anthropic.com>
Feed replay events into the store's analytics pipeline via store.add()
in the onEvent callback, enabling worker analytics, error grouping,
cross-reference tracking, collision detection, and semantic narrative
generation during replay sessions. Print analytics summary on replay end.
Co-Authored-By: Claude Code (glm-5-turbo) <noreply@anthropic.com>
The header now displays a live badge showing worker counts by status:
- Green for active workers
- Blue for idle workers
- Red for error workers
Example: "FABRIC - Worker Activity Monitor [2 active | 1 idle]"
Changes:
- getWorkerStats() calculates total/active/idle/error counts
- getHeaderContent() builds colored badge with status breakdown
- updateHeader() refreshes the badge in default view
- Added updateHeader() call in addEvent() for real-time updates
- Fixed setViewMode() to use getHeaderContent() when returning to default view
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify WorkerDetail displays correct worker information including status
(active/idle/error), uptime formatting, beads completed count, last
activity details (bead, tool, duration, error), and recent events with
proper color tags and truncation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create vitest E2E test that verifies WorkerGrid component renders
worker entries with correct status colors:
- Green (light-green-fg) for active workers
- Yellow (light-yellow-fg) for idle workers
- Red (light-red-fg) for error workers
The test mocks blessed screen and verifies the content contains
appropriate color tags matching the theme system's status colors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrap os.homedir() in try/catch and guard against undefined return value
in ThemeManager.getConfigPath(). In CI environments, os.homedir() may be
undefined, causing path.join() to throw and preventing SessionReplay
from being constructed. When homedir is unavailable, theme persistence
is gracefully skipped and the default dark theme is used.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
persistSession() still used msg.includes('completed'/'finished'/'closed') to
find bead completion events. With NEEDLE format, msg is the event type string
(e.g. 'bead.completed'), so substring matching can misfire on
'bead.agent_completed'. Match on exact event types 'bead.completed' and
'bead.failed' instead — consistent with the fix already applied to
updateWorkerInfo() and trackTaskForHistory() in bd-vqd.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deploy FABRIC web dashboard as a systemd user service that starts on
boot and auto-restarts on crash. Includes service.sh management script
for start/stop/restart/deploy operations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8 open beads with 0 claims available for work - this was a race condition
in the worker discovery loop, not actual starvation.
Available work:
- bd-muv (P0): Gap analysis
- bd-1p8-bd-2x9 (P1): TUI and testing tasks
- bd-o0x (P2): Worker count badge
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude Worker <noreply@anthropic.com>
Investigation revealed this was a false positive:
- br ready showed 12 ready issues available
- bd-muv P0 and multiple P1/P2 beads were claimable
- All dependencies on bd-muv were closed
- No blockers existed
- Lock file was empty
The worker starvation detection triggered incorrectly.
Recommend investigating worker's claim/priority logic.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Investigation found 8 open beads available in workspace with no
dependencies. The worker's discovery mechanism failed to detect
existing open beads - ready-queue.json was stale (Mar 7).
Available work for other workers:
- bd-1p8 (P1) - Hot reload for TUI
- bd-129 (P1) - Integration test for log parsing
- bd-1j9 (P1) - E2E test WorkerDetail
- bd-29t (P1) - E2E test ActivityStream
- bd-2x9 (P1) - E2E test WorkerGrid
- bd-o0x (P2) - Worker count badge
- bd-3rf (P1) - Regression test suite
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude Worker <noreply@anthropic.com>
Closed bd-3tz, bd-h0u, bd-1n3 as false positives - task beads
were available in workspace (bd-muv, bd-1p8, bd-3rf, etc.) when
these starvation alerts were incorrectly triggered.
Co-Authored-By: Claude Worker <noreply@anthropic.com>
Investigation revealed:
- 8 open beads exist in FABRIC workspace
- All are assigned to 'coder', preventing workers from claiming them
- This is a false positive - work exists but is not claimable
- Available unblocked work: bd-1p8 (P1), bd-o0x (P2)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fixed corrupted auth middleware code in server.ts that was causing build failure
- Verified TUI color rendering works correctly (bd-2b3)
- Resolved bd-jec: Found 8 beads in ready queue, worker starvation was due to assignee filtering
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Added authToken option to WebServerOptions interface
- Created createAuthMiddleware function for Bearer token auth
- Applied auth middleware to POST /api/events and /api/events/batch
- Updated CLI to read FABRIC_AUTH_TOKEN env var
- Added comprehensive tests for authentication scenarios
- Updated POST /api/events test cases to use auth token
- All tests passing successfully
POST /api/events endpoint implemented and tested:
- Accepts NEEDLE telemetry events via HTTP POST
- Validates required fields (ts, event)
- Stores events and broadcasts to WebSocket clients
- Comprehensive test coverage (66 tests passing)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
The NEEDLE stuck alert was triggered incorrectly. Investigation showed:
- 11 open beads available in the workspace
- 3 in-progress beads being worked on
- System has recovered and work is available
Resolution: Closed as false positive - no action needed.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add MAX_BATCH_SIZE constant (100 events limit)
- Implement POST /api/events/batch endpoint that accepts JSON array of events
- Validate array format, empty batches, and batch size limits
- Validate each event has required fields (ts, event)
- Store all valid events via store.add()
- Broadcast all ingested events via WebSocket
- Return 201 with ingested count, total count, and errors array
- Handle partial success (valid events processed, errors reported)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude Worker <noreply@anthropic.com>