Commit graph

12 commits

Author SHA1 Message Date
jedarden
71ffa3485b fix(bd-ch6): use Type=simple for fabric-web.service reliability
Type=notify with WatchdogSec was timing out due to sd_notify issues.
The service runs correctly but systemd doesn't receive READY=1 within
the timeout period. Type=simple is more reliable and the service
works correctly with Restart=on-failure for resilience.

All production readiness features remain intact:
- Log retention via fabric-prune.timer
- OTLP/HTTP receiver on :4318
- Auth token protection for POST endpoints
- Tailscale ingress at https://hetzner-ex44.tail1b1987.ts.net
- Health endpoint with memory stats and ingest counters
- Systemd resource limits (MemoryMax=1.5G, CPUQuota=200%)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: bd-ch6
2026-04-30 16:22:16 -04:00
jedarden
455da572a8 feat(retention): add systemd timer for automatic NEEDLE log pruning
Add systemd timer and service for daily log pruning at 03:00 UTC. Includes
manual prune API endpoint, setup script, and updated documentation.

## Changes
- Add `fabric-prune.service` - systemd oneshot service for log pruning
- Add `fabric-prune.timer` - daily timer (03:00 UTC) with persistent=true
- Add `POST /api/retention/prune` - manual prune trigger with auth
- Add `scripts/setup-fabric-prune.sh` - one-shot timer installer
- Update `CLAUDE.md` - document retention policy and usage

## Retention Policy
- `archiveAfterDays: 3` - files older than 3d → archive/
- `maxAgeDays: 7` - files older than 7d → delete (safety net)
- `archiveRetentionDays: 30` - archives older than 30d → delete

## Integration
- Emits `mend.logs_pruned` events to `fabric-mend.jsonl`
- FABRIC DirectoryTailer auto-discovers events
- `/api/retention` endpoint shows current state and last prune

Resolves bd-ch6.2

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 16:22:16 -04:00
jedarden
a6418ac539 feat(bd-ch6.8): add systemd hardening limits to fabric-web.service
- MemoryMax=1536M, MemoryHigh=1200M (1.5GB hard limit, 1.2GB soft)
- CPUQuota=200% (max 2 cores)
- StartLimitInterval=120s, StartLimitBurst=5 (rate-limit restarts)
- Add --max-old-space-size=1024 to Node heap
- Add --heap-snapshots --snapshot-interval 30 for leak debugging

Prevents runaway memory/CPU from taking down the host. Watchdog already
implemented in bd-ch6.6 (Type=notify, WatchdogSec=30).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: bd-ch6.8
2026-04-30 16:22:16 -04:00
jedarden
19450d3047 feat(infra): expose FABRIC dashboard over Tailscale with TLS
Configure tailscale serve to proxy https://hetzner-ex44.tail1b1987.ts.net/
to localhost:3000. Tailnet-only — no public internet exposure.

- scripts/setup-tailscale-serve.sh: one-time setup script (idempotent)
- README.md: add Remote Access section with URL, access model, and setup steps
- CLAUDE.md: new project-level reference for service location, URLs, auth model

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 22:05:39 -04:00
jedarden
038cc9348d feat(bd-ch6.6): wire sd_notify + add untracked serverMetrics and health-check files
- Add src/serverMetrics.ts (ServerMetrics class for /api/health + /api/metrics)
- Add scripts/fabric-health-check.sh (curl-based liveness probe)
- Wire sd_notify READY=1 on server start and WATCHDOG=1 keepalives in server.ts
  so the Type=notify systemd service correctly reports start and keeps the
  watchdog alive without an external npm package

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 21:58:37 -04:00
jedarden
87c7888351 feat(bd-ch6.6): add /api/health + /api/metrics self-observability
- /api/health returns {status, uptime_sec, version, event_count,
  ingest_rate_per_sec, ws_clients, tailer_files_watched, dedup_dropped,
  process_resident_memory_bytes}; returns HTTP 503 with status='overloaded'
  when maxEventCount is exceeded
- /api/metrics exposes the same counters in Prometheus text format;
  fabric_status=0 when overloaded
- Add ServerMetrics.eventCount setter so both endpoints sync from store.size
  (fixes fabric_event_count in /api/metrics showing 0 when events added directly)
- Wire --max-events CLI option into `fabric web`; pass maxEventCount and
  deduplicator to createWebServer so the memory-bomb guard and dedup_dropped
  reporting are actually activated
- Track tailerFilesWatched: set after tailer.start() and update on each event
  for DirectoryTailer (uses activeFiles.length getter)
- Add import for Node net module used by systemd watchdog notify
- Add tests: overload guard returns 503, within-limit returns 200, Prometheus
  reflects fabric_status=0 when overloaded

systemd service already has Restart=on-failure + WatchdogSec=30 (scripts/fabric-web.service);
liveness guard in server.ts calls process.exit(1) after 3 consecutive overload
checks, triggering systemd restart.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 21:54:38 -04:00
jedarden
43023b2596 feat(bd-ch6.4): wire FABRIC_AUTH_TOKEN end-to-end in service template
- Add EnvironmentFile=/home/coding/.config/fabric/secrets.env to
  scripts/fabric-web.service so the auth token is loaded from the
  secrets file at start (not exposed in ps aux)
- Add --otlp-http :4318 to match the deployed unit (already live)

The full auth chain is now documented in the service template:
  ~/.config/fabric/secrets.env (0600) → EnvironmentFile → server
  ~/.needle/config.yaml auth_token: "${FABRIC_AUTH_TOKEN}" → NEEDLE

POST /api/events returns 401 without token; NEEDLE workers
authenticate via Bearer token sourced from the same secrets file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 21:26:40 -04:00
jedarden
8d1af46705 docs(bd-n8y): add auth token documentation to README and startup script 2026-04-23 16:14:51 -04:00
jedarden
3a36c14162 feat(bd-288): deploy fabric web as persistent systemd service
Update service and script to use DirectoryTailer on ~/.needle/logs
instead of the old single-file workers.log path. Rebuild dist/ so
the running service picks up Phase 8 directory-tailing changes.

- scripts/fabric-web.service: add --source /home/coding/.needle/logs
- scripts/fabric-web.sh: replace FABRIC_LOG_PATH with FABRIC_LOG_SOURCE,
  switch from -f (single file) to --source (directory) mode
- Rebuilt dist/ via npm run build
- Restarted fabric-web.service (enabled, linger=yes, health: ok)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 15:53:34 -04:00
jedarden
b0f7c5020e feat(bd-288): Add systemd service for persistent FABRIC web server
Deploy FABRIC web dashboard as a systemd user service that starts on
boot and auto-restarts on crash. Includes service.sh management script
for start/stop/restart/deploy operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 23:51:07 -04:00
default
3d6f19ec43 feat(bd-288): add FABRIC web server persistent service script
- Add scripts/fabric-web.sh for managing FABRIC web server as tmux session
- Supports start/stop/restart/status/logs commands
- Auto-restarts on crash via tmux session management
- Configurable via FABRIC_PORT and FABRIC_LOG_PATH env vars
- Fix TypeScript build error in CommandPalette.test.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:01:48 +00:00
jeda
57e8193f7b feat(bd-2kf): Add comprehensive test coverage for parser and store
- Add 36 parser tests covering:
  - parseLogLine with valid/invalid inputs
  - parseLogLines for multi-line parsing
  - formatEvent with all options
  - Edge cases: malformed JSON, missing fields, colorization

- Add 35 store tests covering:
  - InMemoryEventStore add/query operations
  - Worker status tracking (active/idle/error)
  - Event filtering by worker, level, bead, timestamp
  - maxEvents limit and LRU trimming
  - getStore/resetStore singleton management

- Close phase beads (bd-2pa, bd-n8l, bd-2nu) as infrastructure complete
- Close test beads (bd-5eh, bd-2en) with comprehensive coverage
- Total: 91 tests passing across parser, store, and tailer

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-03 10:43:24 +00:00