Commit graph

199 commits

Author SHA1 Message Date
Argo Workflows CI
d11fef8bb6 ci: auto-bump version to 0.1.72 2026-04-07 18:37:21 +00:00
jedarden
bf40673b72 feat: wire anomaly detection & security mode API endpoints
AnomalyDetector is initialized in main() with periodic model updates.
Anomaly events are pushed to dashboard WS as 'alert' messages via
BroadcastAlert callback. Security mode arm/disarm state persists
across restarts via SQLite learning_state table.

Endpoints:
- GET /api/anomalies?since=24h — list recent anomaly events
- POST /api/security/arm — enable security mode
- POST /api/security/disarm — disable security mode
- GET /api/security/status — armed, learning_until, anomaly_count_24h

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 14:36:59 -04:00
Argo Workflows CI
7347a5295b ci: auto-bump version to 0.1.71 2026-04-07 18:20:45 +00:00
jedarden
008d3caa60 feat: fix floorplan table schema and create /data/floorplan directory
- Fix migration 001 floorplan schema: use distance_m instead of cal_distance_m,
  and rotation_deg instead of room_bounds_json
- Update migration 010 to ALTER existing floorplan tables for databases
  that already ran migration 001
- Create /data/floorplan directory in db.OpenDB for storing floor plan images
2026-04-07 14:20:38 -04:00
Argo Workflows CI
f2ff68481c ci: auto-bump version to 0.1.70 2026-04-07 18:13:17 +00:00
jedarden
04129addd3 feat: implement Zones CRUD REST endpoints with OpenAPI docs
- GET/POST /api/zones - list and create zones with JSON responses
- PUT /api/zones/{id} - update existing zone
- DELETE /api/zones/{id} - delete zone
- All endpoints return JSON with proper HTTP status codes
- OpenAPI/Swagger annotations present (@Summary, @Description, @Tags, @Router, etc.)
- Table-driven tests for all CRUD operations

Acceptance criteria met:
- Endpoints respond correctly to HTTP requests
- Godoc annotations present for API documentation
2026-04-07 14:13:08 -04:00
jedarden
adf01975a0 fix: correct config field name from InstallSecretHex to InstallSecret
The provisioning.NewServer call was using cfg.InstallSecretHex which
doesn't exist. The correct field name in config.Config is InstallSecret.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 14:13:08 -04:00
Argo Workflows CI
539d436e31 ci: auto-bump version to 0.1.69 2026-04-07 17:38:21 +00:00
jedarden
98c43b3734 feat: wire NTP client into firmware build and initialization 2026-04-07 13:38:13 -04:00
jedarden
8a809fee2f feat: wire anomaly detection & security mode API endpoints
All acceptance criteria verified:
- AnomalyDetector initialized in main() with providers wired
- Anomaly events pushed to dashboard WS feed as 'alert' messages
- GET /api/anomalies?since=24h returns active + history anomalies
- POST /api/security/arm + /api/security/disarm endpoints
- GET /api/security/status with armed, learning_until, anomaly_count_24h
- Security mode persists across restarts via learning_state table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:38:13 -04:00
Argo Workflows CI
9d6808de01 ci: auto-bump version to 0.1.68 2026-04-07 17:32:31 +00:00
jedarden
c256a02490 feat: wire NTP client into firmware build and initialization
Firmware (already implemented):
- ntp.c: Call esp_sntp_setservername() before esp_sntp_init()
- ntp.c: 10-minute periodic resync via esp_timer
- main.c: Read ntp_server from NVS (default: pool.ntp.org)
- main.c: 10-second sync attempt after WiFi connect with WARN on failure
- websocket.c: Include ntp_synced status in health JSON

Mothership (added):
- message.go: Add NTPSynced field to HealthMessage struct
- message.go: Add NTPServer field to ConfigMessage struct
- server.go: Add SendNTPServerToMAC() method for runtime NTP config
- server.go: Update sendConfig() to accept NTP server parameter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:32:24 -04:00
jedarden
733b30f0bd feat: wire load-shedding level to health endpoint and dashboard WS alerts
- Rename health endpoint JSON field from 'load_level' to 'shedding_level'
- Add GetShedLevel callback to health checker for direct ProcessorManager access
- Dashboard WebSocket alerts now broadcast on Level 3 trigger and recovery
- Level 3 actively pushes 10Hz rate cap to all connected nodes
- Recovery from Level 3 restores adaptive rate control automatically

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:32:24 -04:00
Argo Workflows CI
4b92fac7f2 ci: auto-bump version to 0.1.67 2026-04-07 17:21:20 +00:00
jedarden
f851ede69e feat: wire anomaly detection & security mode API endpoints
- AnomalyDetector initialized and running in main() with periodic updates
- Anomaly events pushed to dashboard WS feed as 'alert' messages
- GET /api/anomalies?since=24h lists recent anomaly events
- POST /api/security/arm + /api/security/disarm endpoints
- GET /api/security/status returns armed, learning_until, anomaly_count_24h
- Arm/disarm state persists via learning_state SQLite table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:20:54 -04:00
Argo Workflows CI
0f8645332a ci: auto-bump version to 0.1.66 2026-04-07 17:19:31 +00:00
jedarden
a42c5e7ea1 feat: wire NTP client into firmware build and initialization
- Add ntp.c to CMakeLists.txt SRCS so it's compiled and linked
- Load ntp_server from NVS in load_nvs_config() (default: pool.ntp.org)
- Add ntp_server field to spaxel_state_t
- Initialize NTP after WiFi connects with 10s sync timeout, WARN on failure
- Re-sync NTP after WiFi reconnect (WIFI_LOST state)
- Start periodic 10-minute resync timer via esp_timer
- Add ntp_synced boolean to health JSON message
- Handle ntp_server field in downstream config message
- Fix periodic resync callback to properly stop/restart SNTP

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:19:07 -04:00
Argo Workflows CI
83a86faee4 ci: auto-bump version to 0.1.65 2026-04-07 17:15:18 +00:00
jedarden
c44065e927 feat: wire load-shedding level to health endpoint and dashboard WS alerts
- Connect pm.GetShedLevel to health checker (exposes load_level in /healthz)
- Wire OnShedLevelChange callback to broadcast dashboard WS alert on Level 3
- Log rate reduction push and recovery messages for Level 3 transitions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:14:54 -04:00
jedarden
6263ce1554 feat: register Zones CRUD REST API endpoints
- Register zones and portals API handler in main.go
- GET /api/zones - list all zones with occupancy
- POST /api/zones - create a new zone
- PUT /api/zones/{id} - update existing zone
- DELETE /api/zones/{id} - delete a zone
- OpenAPI-style godoc comments already in place
- Zone changes reflect in live 3D view within one WebSocket cycle (10 Hz polling)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:14:54 -04:00
Argo Workflows CI
139f5954f4 ci: auto-bump version to 0.1.64 2026-04-07 16:53:25 +00:00
jedarden
a9fa6f6f25 feat: add per-iteration timing and load-shedding to ProcessorManager
Add a 5-iteration rolling average timer to Process() with automatic
load-shedding levels (0-3) based on pipeline duration thresholds
(80ms/90ms/95ms). Recovery steps down when avg drops below 60ms for
10 consecutive iterations. Includes GetShedLevel() getter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:53:06 -04:00
Argo Workflows CI
57c27de729 ci: auto-bump version to 0.1.63 2026-04-07 16:52:06 +00:00
jedarden
41d6d09561 feat: implement internal pub/sub event bus
- Add TimestampMs field to eventbus.Event
- Add event type constants (detection, zone_entry, zone_exit, etc.)
- Add severity level constants
- Add global Default() bus instance for shared access
- Add convenience functions: PublishDefault, PublishDefaultSync, SubscribeDefault
- Integrate with events.InsertEvent to publish to eventbus
- Add comprehensive table-driven tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:51:41 -04:00
Argo Workflows CI
3db48cd61d ci: auto-bump version to 0.1.62 2026-04-07 16:40:33 +00:00
jedarden
76ac2710c9 feat: startup phase sequencing with 30s timeout enforcement
Implement explicit 7-phase startup logging and timeout enforcement:
- Phases 1-4 (data dir, SQLite, migrations, secrets) in db.OpenDB
- Phase 5 (subsystems) with 5s per-subsystem timeout via SubsystemStart
- Phase 6 (HTTP + mDNS) and Phase 7 (health check + ready file)
- FatalFunc injection for testable timeout handling
- Each phase logs [PHASE N/7 — Description] on start, [PHASE N/7 OK] (Xms) on completion
- 30s total startup deadline via context.WithTimeout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:40:25 -04:00
jedarden
984f9ef262 feat: add nightly archive scheduler for events (02:00 local time)
- Add StartArchiveScheduler function that runs RunArchiveJob nightly at 02:00 local time
- Scheduler respects local timezone and gracefully stops on done channel signal
- Add table-driven tests for scheduler start and stop behavior
- Add AnomalyType, SystemMode, and sleep session event types to types.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:40:25 -04:00
jedarden
60a21bacb6 feat: add end-to-end integration test harness
Implements a comprehensive e2e test system that:
- Starts mothership container/binary
- Waits for /healthz with 15s timeout
- Handles PIN auth setup if needed
- Runs CSI simulator against mothership
- Asserts during run (health, nodes online, blob detection)
- Validates frame rate doesn't drop >20%
- Asserts detection events recorded

Components added:
- mothership/cmd/sim: CSI simulator that generates synthetic frames
- mothership/tests/e2e: Go test suite with WebSocket assertions
- tests/e2e/run.sh: Shell script with comprehensive assertions
- .github/workflows/e2e.yml: CI workflow for automated testing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:40:25 -04:00
Argo Workflows CI
9810da2ee6 ci: auto-bump version to 0.1.61 2026-04-07 16:34:52 +00:00
jedarden
ff3428fee6 feat: robust WebSocket reconnection with backoff, extrapolation, and visual states
Implements exponential backoff (1s→10s cap) with ±500ms jitter,
blob position extrapolation during disconnects (capped at 2s),
three visual states (silent <5s, dimming 5-30s, modal >30s),
and automatic scene restoration on reconnect.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:34:35 -04:00
jedarden
31659a5ccc fix: resolve TestTimeoutDoesNotDisable hang in trigger tests
Replace httptest.Server with raw net.Listen to avoid Close() blocking
on active connections after a timeout, which caused the test to hang
indefinitely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:34:35 -04:00
Argo Workflows CI
f951c029a3 ci: auto-bump version to 0.1.60 2026-04-07 16:22:24 +00:00
jedarden
d41cfe3e4f feat: add CSI frame validation with DEBUG logging and performance benchmark
Implement strict CSI binary frame validation with per-connection malformed
frame counters and automatic connection closure on persistent malformed input.

Validation rules implemented:
- Minimum frame length: 24 bytes (header only)
- Maximum frame length: 280 bytes (24 header + 128 subcarriers × 2 bytes)
- n_sub field: must be ≤128
- Payload length: must equal n_sub × 2 bytes exactly
- channel: must be in [1,14] for 2.4 GHz; drop if 0 or >14
- rssi: 0 treated as invalid/missing (logged at DEBUG, but frame allowed)
- timestamp_us: any uint64 value accepted

Per-connection malformed counter (sliding 60-second window):
- On each validation failure: increment malformed_count; log at DEBUG
- If malformed_count > 100 within 60s: log WARN
- If malformed_count > 1000 within 60s: close WebSocket with message
  'Excessive malformed frames — possible firmware bug'
- Counter resets every 60s

Acceptance criteria met:
- Valid frame: passes all checks in < 1 μs (benchmark test added)
- Frame with n_sub=200: rejected (n_sub > 128)
- Frame with len=10: rejected (< 24 bytes)
- Frame with channel=0: rejected with DEBUG log
- 1001 malformed frames in 60s: connection closed with correct message
- 101 malformed frames: WARN logged, connection kept open
- RSSI=0: allowed but logged at DEBUG for AGC skip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:22:17 -04:00
Argo Workflows CI
08394fc90f ci: auto-bump version to 0.1.59 2026-04-07 16:15:13 +00:00
jedarden
da116c546b feat: add environment variable validation with documented defaults
- Create internal/config package with Load() function for all env vars
- Validate types (string, bool, int, enum, URL) and ranges
- Collect all validation errors before returning (fail fast)
- Log non-sensitive values at INFO on startup (MQTT_PASSWORD masked)
- Return error slice; main() logs each error and exits(1)
- Unit tests for valid/invalid cases

Env vars validated:
- SPAXEL_BIND_ADDR (string, default '0.0.0.0:8080')
- SPAXEL_DATA_DIR (string, default '/data')
- SPAXEL_STATIC_DIR (string, default '/dashboard')
- SPAXEL_MDNS_ENABLED (bool, default true)
- SPAXEL_MDNS_NAME (string, default 'spaxel')
- SPAXEL_LOG_LEVEL (enum: debug|info|warn|error, default 'info')
- SPAXEL_FUSION_RATE_HZ (int, range [1,20], default 10)
- SPAXEL_REPLAY_MAX_MB (int, range [10,10000], default 360)
- SPAXEL_INSTALL_SECRET (string, optional, 32+ chars if set)
- SPAXEL_NTP_SERVER (string, default 'pool.ntp.org')
- SPAXEL_MQTT_BROKER (string, optional, must be valid URL if set)
- SPAXEL_MQTT_USERNAME (string, optional)
- SPAXEL_MQTT_PASSWORD (string, optional, never logged)
- TZ (string, default 'UTC', validated via time.LoadLocation)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:15:06 -04:00
Argo Workflows CI
529d6108d3 ci: auto-bump version to 0.1.58 2026-04-07 16:10:33 +00:00
jedarden
0377426926 fix: wire anomaly detection & security mode API endpoints
- Add missing CountAnomaliesSince method to mockDetectorProvider
  in security_test.go to satisfy the DetectorProvider interface
- Fix variable shadowing bug in anomaly.go QueryAnomalyEvents
  where incomplete rename from 'events' to 'result' caused
  append(events, &e) to reference the package instead of the slice

All security mode endpoints verified:
- GET /api/anomalies?since=24h — lists recent anomaly events
- POST /api/security/arm + /api/security/disarm — arm/disarm
- GET /api/security/status — {armed, learning_until, anomaly_count_24h}
- Anomaly events push to dashboard WS as 'alert' messages
- Arm/disarm state persists across restarts via learning_state table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:10:01 -04:00
Argo Workflows CI
c15e03a9d2 ci: auto-bump version to 0.1.57 2026-04-07 15:52:03 +00:00
jedarden
001c17bd85 feat: implement 10-step SIGTERM graceful shutdown sequence
Implements the full ordered shutdown sequence so the mothership drains
cleanly without data loss on SIGTERM (Docker stop, Kubernetes termination).

Shutdown sequence (30s hard deadline):
1. Set shutting_down=true; ingestion server returns HTTP 503 to new WebSocket upgrade requests
2. Broadcast {type:'shutdown', reconnect_in_ms:30000} to all dashboard WebSocket clients
3. Cancel fusion loop context (stops fusion goroutine)
4. Drain signal processing pipeline: wait for in-flight CSI frames (max 2s)
5. Flush in-memory baselines to SQLite in a single transaction
6. Sync CSI recording buffer to disk (close writer, fsync)
7. Close all node WebSocket connections with normal close frame (1000)
8. Write {type:'system', description:'Mothership stopped'} event to events table
9. PRAGMA wal_checkpoint(FULL) to collapse WAL into main DB file
10. sqlite3.Close()

Each step gets its own log line: '[SHUTDOWN] Step N/10 — ...'
Steps that fail log ERROR but do not abort remaining steps.
Exit code 0 if all steps completed within deadline; exit code 1 if deadline exceeded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 11:51:38 -04:00
Argo Workflows CI
1305890f49 ci: auto-bump version to 0.1.56 2026-04-07 15:39:03 +00:00
jedarden
5db3110a2a feat: implement automation triggers CRUD REST endpoints
Add full CRUD endpoints for triggers with OpenAPI-style godoc comments:
- GET/POST /api/triggers (list all, create new)
- PUT/DELETE /api/triggers/{id} (update, delete)
- POST /api/triggers/{id}/test (fire trigger once for testing)

Both TriggersHandler (simple) and VolumeTriggersHandler (3D geometry)
implement all endpoints with table-driven tests covering validation,
persistence, and round-trip lifecycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 11:38:34 -04:00
Argo Workflows CI
173490ba98 ci: auto-bump version to 0.1.55 2026-04-07 15:09:47 +00:00
jedarden
e44dd345f6 feat: implement comprehensive /healthz endpoint
Add complete health check implementation for Docker HEALTHCHECK and
Traefik health routing with:

Response fields:
- status: "ok" or "degraded"
- uptime_s: seconds since mothership boot
- version: mothership version string
- nodes_online: count of connected nodes
- db: "ok" or "failing" (SELECT 1 with 100ms timeout)
- load_level: 0-3 from load shedding state
- reason: human-readable explanation (only when degraded)

HTTP status codes:
- 200 for healthy (status="ok")
- 503 for degraded (status="degraded")

Degraded conditions:
- Database unreachable
- Load level 3 sustained for >60 seconds
- No nodes connected after 5 minutes uptime

Docker HEALTHCHECK updated to verify status="ok" response.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 11:09:36 -04:00
Argo Workflows CI
4c3e6e3dd5 ci: auto-bump version to 0.1.54 2026-04-07 15:05:12 +00:00
jedarden
97f1eafc6f feat: process buffered events from delta WebSocket updates
Events (zone entries/exits, portal crossings, presence transitions)
were already broadcast immediately via BroadcastEvent, but the
buffered copies included in the 10 Hz delta tick were silently
dropped by handleIncrementalUpdate. Now delta events are processed
through the same handleEventMessage path, with dedup to avoid
double-processing when both immediate and delta copies arrive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 11:04:49 -04:00
Argo Workflows CI
c77bf42178 ci: auto-bump version to 0.1.53 2026-04-07 14:52:40 +00:00
jedarden
4eada81a96 chore: add missing go-chi/chi/v5 dependency for floorplan package
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 10:52:08 -04:00
Argo Workflows CI
d30f886efc ci: auto-bump version to 0.1.52 2026-04-07 14:27:50 +00:00
jedarden
391ed884e4 feat: implement NVS schema migration on boot
Implement versioned NVS key migration on ESP32-S3 firmware so
OTA-updated firmware gracefully handles NVS written by older versions.

- Add nvs_migration.c/h with migration framework
- On boot, read schema_ver from NVS; initialize to 1 if missing
- Run migrations sequentially if schema_ver < COMPILED_NVS_VERSION
- Each migration commits after each write for durability
- Log all migration steps to UART for debugging
- Example migration v1→v2: rename 'ms_ip' to 'mothership_ip',
  add 'ntp_server' with default 'pool.ntp.org'
- Migration failure leaves NVS in consistent state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 10:27:38 -04:00
Argo Workflows CI
80bca356cd ci: auto-bump version to 0.1.51 2026-04-07 14:22:10 +00:00