Commit graph

185 commits

Author SHA1 Message Date
jedarden
f851ede69e feat: wire anomaly detection & security mode API endpoints
- AnomalyDetector initialized and running in main() with periodic updates
- Anomaly events pushed to dashboard WS feed as 'alert' messages
- GET /api/anomalies?since=24h lists recent anomaly events
- POST /api/security/arm + /api/security/disarm endpoints
- GET /api/security/status returns armed, learning_until, anomaly_count_24h
- Arm/disarm state persists via learning_state SQLite table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:20:54 -04:00
Argo Workflows CI
0f8645332a ci: auto-bump version to 0.1.66 2026-04-07 17:19:31 +00:00
jedarden
a42c5e7ea1 feat: wire NTP client into firmware build and initialization
- Add ntp.c to CMakeLists.txt SRCS so it's compiled and linked
- Load ntp_server from NVS in load_nvs_config() (default: pool.ntp.org)
- Add ntp_server field to spaxel_state_t
- Initialize NTP after WiFi connects with 10s sync timeout, WARN on failure
- Re-sync NTP after WiFi reconnect (WIFI_LOST state)
- Start periodic 10-minute resync timer via esp_timer
- Add ntp_synced boolean to health JSON message
- Handle ntp_server field in downstream config message
- Fix periodic resync callback to properly stop/restart SNTP

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:19:07 -04:00
Argo Workflows CI
83a86faee4 ci: auto-bump version to 0.1.65 2026-04-07 17:15:18 +00:00
jedarden
c44065e927 feat: wire load-shedding level to health endpoint and dashboard WS alerts
- Connect pm.GetShedLevel to health checker (exposes load_level in /healthz)
- Wire OnShedLevelChange callback to broadcast dashboard WS alert on Level 3
- Log rate reduction push and recovery messages for Level 3 transitions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:14:54 -04:00
jedarden
6263ce1554 feat: register Zones CRUD REST API endpoints
- Register zones and portals API handler in main.go
- GET /api/zones - list all zones with occupancy
- POST /api/zones - create a new zone
- PUT /api/zones/{id} - update existing zone
- DELETE /api/zones/{id} - delete a zone
- OpenAPI-style godoc comments already in place
- Zone changes reflect in live 3D view within one WebSocket cycle (10 Hz polling)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:14:54 -04:00
Argo Workflows CI
139f5954f4 ci: auto-bump version to 0.1.64 2026-04-07 16:53:25 +00:00
jedarden
a9fa6f6f25 feat: add per-iteration timing and load-shedding to ProcessorManager
Add a 5-iteration rolling average timer to Process() with automatic
load-shedding levels (0-3) based on pipeline duration thresholds
(80ms/90ms/95ms). Recovery steps down when avg drops below 60ms for
10 consecutive iterations. Includes GetShedLevel() getter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:53:06 -04:00
Argo Workflows CI
57c27de729 ci: auto-bump version to 0.1.63 2026-04-07 16:52:06 +00:00
jedarden
41d6d09561 feat: implement internal pub/sub event bus
- Add TimestampMs field to eventbus.Event
- Add event type constants (detection, zone_entry, zone_exit, etc.)
- Add severity level constants
- Add global Default() bus instance for shared access
- Add convenience functions: PublishDefault, PublishDefaultSync, SubscribeDefault
- Integrate with events.InsertEvent to publish to eventbus
- Add comprehensive table-driven tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:51:41 -04:00
Argo Workflows CI
3db48cd61d ci: auto-bump version to 0.1.62 2026-04-07 16:40:33 +00:00
jedarden
76ac2710c9 feat: startup phase sequencing with 30s timeout enforcement
Implement explicit 7-phase startup logging and timeout enforcement:
- Phases 1-4 (data dir, SQLite, migrations, secrets) in db.OpenDB
- Phase 5 (subsystems) with 5s per-subsystem timeout via SubsystemStart
- Phase 6 (HTTP + mDNS) and Phase 7 (health check + ready file)
- FatalFunc injection for testable timeout handling
- Each phase logs [PHASE N/7 — Description] on start, [PHASE N/7 OK] (Xms) on completion
- 30s total startup deadline via context.WithTimeout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:40:25 -04:00
jedarden
984f9ef262 feat: add nightly archive scheduler for events (02:00 local time)
- Add StartArchiveScheduler function that runs RunArchiveJob nightly at 02:00 local time
- Scheduler respects local timezone and gracefully stops on done channel signal
- Add table-driven tests for scheduler start and stop behavior
- Add AnomalyType, SystemMode, and sleep session event types to types.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:40:25 -04:00
jedarden
60a21bacb6 feat: add end-to-end integration test harness
Implements a comprehensive e2e test system that:
- Starts mothership container/binary
- Waits for /healthz with 15s timeout
- Handles PIN auth setup if needed
- Runs CSI simulator against mothership
- Asserts during run (health, nodes online, blob detection)
- Validates frame rate doesn't drop >20%
- Asserts detection events recorded

Components added:
- mothership/cmd/sim: CSI simulator that generates synthetic frames
- mothership/tests/e2e: Go test suite with WebSocket assertions
- tests/e2e/run.sh: Shell script with comprehensive assertions
- .github/workflows/e2e.yml: CI workflow for automated testing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:40:25 -04:00
Argo Workflows CI
9810da2ee6 ci: auto-bump version to 0.1.61 2026-04-07 16:34:52 +00:00
jedarden
ff3428fee6 feat: robust WebSocket reconnection with backoff, extrapolation, and visual states
Implements exponential backoff (1s→10s cap) with ±500ms jitter,
blob position extrapolation during disconnects (capped at 2s),
three visual states (silent <5s, dimming 5-30s, modal >30s),
and automatic scene restoration on reconnect.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:34:35 -04:00
jedarden
31659a5ccc fix: resolve TestTimeoutDoesNotDisable hang in trigger tests
Replace httptest.Server with raw net.Listen to avoid Close() blocking
on active connections after a timeout, which caused the test to hang
indefinitely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:34:35 -04:00
Argo Workflows CI
f951c029a3 ci: auto-bump version to 0.1.60 2026-04-07 16:22:24 +00:00
jedarden
d41cfe3e4f feat: add CSI frame validation with DEBUG logging and performance benchmark
Implement strict CSI binary frame validation with per-connection malformed
frame counters and automatic connection closure on persistent malformed input.

Validation rules implemented:
- Minimum frame length: 24 bytes (header only)
- Maximum frame length: 280 bytes (24 header + 128 subcarriers × 2 bytes)
- n_sub field: must be ≤128
- Payload length: must equal n_sub × 2 bytes exactly
- channel: must be in [1,14] for 2.4 GHz; drop if 0 or >14
- rssi: 0 treated as invalid/missing (logged at DEBUG, but frame allowed)
- timestamp_us: any uint64 value accepted

Per-connection malformed counter (sliding 60-second window):
- On each validation failure: increment malformed_count; log at DEBUG
- If malformed_count > 100 within 60s: log WARN
- If malformed_count > 1000 within 60s: close WebSocket with message
  'Excessive malformed frames — possible firmware bug'
- Counter resets every 60s

Acceptance criteria met:
- Valid frame: passes all checks in < 1 μs (benchmark test added)
- Frame with n_sub=200: rejected (n_sub > 128)
- Frame with len=10: rejected (< 24 bytes)
- Frame with channel=0: rejected with DEBUG log
- 1001 malformed frames in 60s: connection closed with correct message
- 101 malformed frames: WARN logged, connection kept open
- RSSI=0: allowed but logged at DEBUG for AGC skip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:22:17 -04:00
Argo Workflows CI
08394fc90f ci: auto-bump version to 0.1.59 2026-04-07 16:15:13 +00:00
jedarden
da116c546b feat: add environment variable validation with documented defaults
- Create internal/config package with Load() function for all env vars
- Validate types (string, bool, int, enum, URL) and ranges
- Collect all validation errors before returning (fail fast)
- Log non-sensitive values at INFO on startup (MQTT_PASSWORD masked)
- Return error slice; main() logs each error and exits(1)
- Unit tests for valid/invalid cases

Env vars validated:
- SPAXEL_BIND_ADDR (string, default '0.0.0.0:8080')
- SPAXEL_DATA_DIR (string, default '/data')
- SPAXEL_STATIC_DIR (string, default '/dashboard')
- SPAXEL_MDNS_ENABLED (bool, default true)
- SPAXEL_MDNS_NAME (string, default 'spaxel')
- SPAXEL_LOG_LEVEL (enum: debug|info|warn|error, default 'info')
- SPAXEL_FUSION_RATE_HZ (int, range [1,20], default 10)
- SPAXEL_REPLAY_MAX_MB (int, range [10,10000], default 360)
- SPAXEL_INSTALL_SECRET (string, optional, 32+ chars if set)
- SPAXEL_NTP_SERVER (string, default 'pool.ntp.org')
- SPAXEL_MQTT_BROKER (string, optional, must be valid URL if set)
- SPAXEL_MQTT_USERNAME (string, optional)
- SPAXEL_MQTT_PASSWORD (string, optional, never logged)
- TZ (string, default 'UTC', validated via time.LoadLocation)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:15:06 -04:00
Argo Workflows CI
529d6108d3 ci: auto-bump version to 0.1.58 2026-04-07 16:10:33 +00:00
jedarden
0377426926 fix: wire anomaly detection & security mode API endpoints
- Add missing CountAnomaliesSince method to mockDetectorProvider
  in security_test.go to satisfy the DetectorProvider interface
- Fix variable shadowing bug in anomaly.go QueryAnomalyEvents
  where incomplete rename from 'events' to 'result' caused
  append(events, &e) to reference the package instead of the slice

All security mode endpoints verified:
- GET /api/anomalies?since=24h — lists recent anomaly events
- POST /api/security/arm + /api/security/disarm — arm/disarm
- GET /api/security/status — {armed, learning_until, anomaly_count_24h}
- Anomaly events push to dashboard WS as 'alert' messages
- Arm/disarm state persists across restarts via learning_state table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:10:01 -04:00
Argo Workflows CI
c15e03a9d2 ci: auto-bump version to 0.1.57 2026-04-07 15:52:03 +00:00
jedarden
001c17bd85 feat: implement 10-step SIGTERM graceful shutdown sequence
Implements the full ordered shutdown sequence so the mothership drains
cleanly without data loss on SIGTERM (Docker stop, Kubernetes termination).

Shutdown sequence (30s hard deadline):
1. Set shutting_down=true; ingestion server returns HTTP 503 to new WebSocket upgrade requests
2. Broadcast {type:'shutdown', reconnect_in_ms:30000} to all dashboard WebSocket clients
3. Cancel fusion loop context (stops fusion goroutine)
4. Drain signal processing pipeline: wait for in-flight CSI frames (max 2s)
5. Flush in-memory baselines to SQLite in a single transaction
6. Sync CSI recording buffer to disk (close writer, fsync)
7. Close all node WebSocket connections with normal close frame (1000)
8. Write {type:'system', description:'Mothership stopped'} event to events table
9. PRAGMA wal_checkpoint(FULL) to collapse WAL into main DB file
10. sqlite3.Close()

Each step gets its own log line: '[SHUTDOWN] Step N/10 — ...'
Steps that fail log ERROR but do not abort remaining steps.
Exit code 0 if all steps completed within deadline; exit code 1 if deadline exceeded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 11:51:38 -04:00
Argo Workflows CI
1305890f49 ci: auto-bump version to 0.1.56 2026-04-07 15:39:03 +00:00
jedarden
5db3110a2a feat: implement automation triggers CRUD REST endpoints
Add full CRUD endpoints for triggers with OpenAPI-style godoc comments:
- GET/POST /api/triggers (list all, create new)
- PUT/DELETE /api/triggers/{id} (update, delete)
- POST /api/triggers/{id}/test (fire trigger once for testing)

Both TriggersHandler (simple) and VolumeTriggersHandler (3D geometry)
implement all endpoints with table-driven tests covering validation,
persistence, and round-trip lifecycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 11:38:34 -04:00
Argo Workflows CI
173490ba98 ci: auto-bump version to 0.1.55 2026-04-07 15:09:47 +00:00
jedarden
e44dd345f6 feat: implement comprehensive /healthz endpoint
Add complete health check implementation for Docker HEALTHCHECK and
Traefik health routing with:

Response fields:
- status: "ok" or "degraded"
- uptime_s: seconds since mothership boot
- version: mothership version string
- nodes_online: count of connected nodes
- db: "ok" or "failing" (SELECT 1 with 100ms timeout)
- load_level: 0-3 from load shedding state
- reason: human-readable explanation (only when degraded)

HTTP status codes:
- 200 for healthy (status="ok")
- 503 for degraded (status="degraded")

Degraded conditions:
- Database unreachable
- Load level 3 sustained for >60 seconds
- No nodes connected after 5 minutes uptime

Docker HEALTHCHECK updated to verify status="ok" response.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 11:09:36 -04:00
Argo Workflows CI
4c3e6e3dd5 ci: auto-bump version to 0.1.54 2026-04-07 15:05:12 +00:00
jedarden
97f1eafc6f feat: process buffered events from delta WebSocket updates
Events (zone entries/exits, portal crossings, presence transitions)
were already broadcast immediately via BroadcastEvent, but the
buffered copies included in the 10 Hz delta tick were silently
dropped by handleIncrementalUpdate. Now delta events are processed
through the same handleEventMessage path, with dedup to avoid
double-processing when both immediate and delta copies arrive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 11:04:49 -04:00
Argo Workflows CI
c77bf42178 ci: auto-bump version to 0.1.53 2026-04-07 14:52:40 +00:00
jedarden
4eada81a96 chore: add missing go-chi/chi/v5 dependency for floorplan package
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 10:52:08 -04:00
Argo Workflows CI
d30f886efc ci: auto-bump version to 0.1.52 2026-04-07 14:27:50 +00:00
jedarden
391ed884e4 feat: implement NVS schema migration on boot
Implement versioned NVS key migration on ESP32-S3 firmware so
OTA-updated firmware gracefully handles NVS written by older versions.

- Add nvs_migration.c/h with migration framework
- On boot, read schema_ver from NVS; initialize to 1 if missing
- Run migrations sequentially if schema_ver < COMPILED_NVS_VERSION
- Each migration commits after each write for durability
- Log all migration steps to UART for debugging
- Example migration v1→v2: rename 'ms_ip' to 'mothership_ip',
  add 'ntp_server' with default 'pool.ntp.org'
- Migration failure leaves NVS in consistent state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 10:27:38 -04:00
Argo Workflows CI
80bca356cd ci: auto-bump version to 0.1.51 2026-04-07 14:22:10 +00:00
jedarden
cac25e86e8 feat: implement security mode dashboard UI
- Update learning progress display to show "X of Y days complete" format
- Add last anomaly location info to security dialog stats
- Add CSS styling for anomaly event type in timeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 10:22:00 -04:00
Argo Workflows CI
f5c71b5113 ci: auto-bump version to 0.1.50 2026-04-07 14:21:33 +00:00
jedarden
9f53218f16 fix: correct no-op trigger update test expectation
The "no-op update returns current" test case was missing wantEnable: true,
causing a false negative since the seeded trigger has Enabled: true.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 10:21:02 -04:00
jedarden
3d3cb41d25 feat: add security mode persistence and tracker blob lifecycle events
- Add GetArmedAt() method to persist armed timestamp across restarts
- Add blob appear/disappear callbacks to tracker for security events
- Add security handler for arm/disarm API endpoints
- Update /api/security endpoint to return armed_at timestamp
- Add tracker tests for blob lifecycle callbacks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 10:14:51 -04:00
jedarden
80ac99ca7c feat: implement security mode dashboard UI
- Add security status indicator in status bar with mode badge
  (DISARMED / LEARNING / ARMED / ALERT)
- Add arm/disarm toggle button with confirmation dialog
- Add learning period progress bar display
- Add alert banner for anomalies when armed
- Add acknowledge functionality for anomalies
- Integrate with WebSocket for real-time updates
- Add security.css with responsive styles

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 10:14:51 -04:00
Argo Workflows CI
3e561593cf ci: auto-bump version to 0.1.49 2026-04-07 13:55:07 +00:00
jedarden
01547269cc feat: verify dashboard WebSocket feed supports events, alerts, BLE, triggers, health
All 5 new message types (event, alert, ble_scan, trigger_state,
system_health) were already implemented in hub.go with broadcast methods,
called from main.go/ingestion/volume_triggers/events, and handled in
app.js. Also includes security mode persistence from anomaly DB and
OpenAPI docs for triggers endpoints.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 09:54:14 -04:00
Argo Workflows CI
6a57997ec5 ci: auto-bump version to 0.1.48 2026-04-07 13:36:58 +00:00
jedarden
fe68bb5fdc feat: implement BLE Devices REST endpoints with OpenAPI docs
- GET /api/ble/devices: list all BLE devices with filtering
- PUT /api/ble/devices/{mac}: update device label and assign to person
- Added comprehensive OpenAPI-style godoc comments
- Supports filtering by registered/discovered/archived status
- Includes device history and aliases endpoints
2026-04-07 09:36:32 -04:00
Argo Workflows CI
dcd0b4e71c ci: auto-bump version to 0.1.47 2026-04-07 13:31:19 +00:00
jedarden
56c28bce63 feat: implement BLE Devices REST endpoints with OpenAPI docs
- Add GET /api/ble/devices to list known devices with filtering
- Add PUT /api/ble/devices/{mac} to set label and assign to person
- Add comprehensive OpenAPI-style godoc comments for all BLE endpoints
- Add table-driven tests for BLE handler endpoints

Endpoints support:
  - Filtering by registration status (registered/discovered)
  - Time window filtering (hours parameter)
  - Device labels and person assignment
  - Sighting history per device

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 09:30:55 -04:00
Argo Workflows CI
52aeb7a6ef ci: auto-bump version to 0.1.46 2026-04-07 13:19:49 +00:00
jedarden
a873663cfc feat: implement Replay/Time-Travel REST endpoints
Add OpenAPI-style godoc comments and comprehensive table-driven tests
for replay endpoints:
- GET /api/replay/sessions - list recording sessions and replay store info
- POST /api/replay/start - start replay at timestamp (speed 1/2/5)
- POST /api/replay/stop - stop replay, return to live
- POST /api/replay/seek - seek within session
- POST /api/replay/tune - update pipeline parameters mid-replay

Improvements:
- Fix writeJSON calls to use proper 3-argument signature
- Add detailed request/response type documentation
- Add mockRecordingStore for isolated unit testing
- Add 12 table-driven test cases covering all endpoints

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 09:19:36 -04:00
Argo Workflows CI
12efb9a097 ci: auto-bump version to 0.1.45 2026-04-07 13:11:22 +00:00