Commit graph

6 commits

Author SHA1 Message Date
jedarden
71ffa3485b fix(bd-ch6): use Type=simple for fabric-web.service reliability
Type=notify with WatchdogSec was timing out due to sd_notify issues.
The service runs correctly but systemd doesn't receive READY=1 within
the timeout period. Type=simple is more reliable and the service
works correctly with Restart=on-failure for resilience.

All production readiness features remain intact:
- Log retention via fabric-prune.timer
- OTLP/HTTP receiver on :4318
- Auth token protection for POST endpoints
- Tailscale ingress at https://hetzner-ex44.tail1b1987.ts.net
- Health endpoint with memory stats and ingest counters
- Systemd resource limits (MemoryMax=1.5G, CPUQuota=200%)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: bd-ch6
2026-04-30 16:22:16 -04:00
jedarden
a6418ac539 feat(bd-ch6.8): add systemd hardening limits to fabric-web.service
- MemoryMax=1536M, MemoryHigh=1200M (1.5GB hard limit, 1.2GB soft)
- CPUQuota=200% (max 2 cores)
- StartLimitInterval=120s, StartLimitBurst=5 (rate-limit restarts)
- Add --max-old-space-size=1024 to Node heap
- Add --heap-snapshots --snapshot-interval 30 for leak debugging

Prevents runaway memory/CPU from taking down the host. Watchdog already
implemented in bd-ch6.6 (Type=notify, WatchdogSec=30).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: bd-ch6.8
2026-04-30 16:22:16 -04:00
jedarden
87c7888351 feat(bd-ch6.6): add /api/health + /api/metrics self-observability
- /api/health returns {status, uptime_sec, version, event_count,
  ingest_rate_per_sec, ws_clients, tailer_files_watched, dedup_dropped,
  process_resident_memory_bytes}; returns HTTP 503 with status='overloaded'
  when maxEventCount is exceeded
- /api/metrics exposes the same counters in Prometheus text format;
  fabric_status=0 when overloaded
- Add ServerMetrics.eventCount setter so both endpoints sync from store.size
  (fixes fabric_event_count in /api/metrics showing 0 when events added directly)
- Wire --max-events CLI option into `fabric web`; pass maxEventCount and
  deduplicator to createWebServer so the memory-bomb guard and dedup_dropped
  reporting are actually activated
- Track tailerFilesWatched: set after tailer.start() and update on each event
  for DirectoryTailer (uses activeFiles.length getter)
- Add import for Node net module used by systemd watchdog notify
- Add tests: overload guard returns 503, within-limit returns 200, Prometheus
  reflects fabric_status=0 when overloaded

systemd service already has Restart=on-failure + WatchdogSec=30 (scripts/fabric-web.service);
liveness guard in server.ts calls process.exit(1) after 3 consecutive overload
checks, triggering systemd restart.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 21:54:38 -04:00
jedarden
43023b2596 feat(bd-ch6.4): wire FABRIC_AUTH_TOKEN end-to-end in service template
- Add EnvironmentFile=/home/coding/.config/fabric/secrets.env to
  scripts/fabric-web.service so the auth token is loaded from the
  secrets file at start (not exposed in ps aux)
- Add --otlp-http :4318 to match the deployed unit (already live)

The full auth chain is now documented in the service template:
  ~/.config/fabric/secrets.env (0600) → EnvironmentFile → server
  ~/.needle/config.yaml auth_token: "${FABRIC_AUTH_TOKEN}" → NEEDLE

POST /api/events returns 401 without token; NEEDLE workers
authenticate via Bearer token sourced from the same secrets file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 21:26:40 -04:00
jedarden
3a36c14162 feat(bd-288): deploy fabric web as persistent systemd service
Update service and script to use DirectoryTailer on ~/.needle/logs
instead of the old single-file workers.log path. Rebuild dist/ so
the running service picks up Phase 8 directory-tailing changes.

- scripts/fabric-web.service: add --source /home/coding/.needle/logs
- scripts/fabric-web.sh: replace FABRIC_LOG_PATH with FABRIC_LOG_SOURCE,
  switch from -f (single file) to --source (directory) mode
- Rebuilt dist/ via npm run build
- Restarted fabric-web.service (enabled, linger=yes, health: ok)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 15:53:34 -04:00
jedarden
b0f7c5020e feat(bd-288): Add systemd service for persistent FABRIC web server
Deploy FABRIC web dashboard as a systemd user service that starts on
boot and auto-restarts on crash. Includes service.sh management script
for start/stop/restart/deploy operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 23:51:07 -04:00