Commit graph

206 commits

Author SHA1 Message Date
jedarden
45f00e184a feat: add missing name-required validation test for zone creation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 15:00:32 -04:00
Argo Workflows CI
336972a326 ci: auto-bump version to 0.1.75 2026-04-07 18:52:32 +00:00
jedarden
267372fcfc fix(floorplan): use actual timestamp instead of constant value
- Fixed currentTimestamp() to return time.Now().UnixMilli() instead of constant 1e9
- Added 'time' package import to support the fix
- This ensures calibration data gets correct timestamps when persisted to SQLite
2026-04-07 14:52:26 -04:00
Argo Workflows CI
c512053867 ci: auto-bump version to 0.1.74 2026-04-07 18:48:06 +00:00
jedarden
aca74c05c4 feat: wire anomaly detection & security mode API endpoints
Confirm AnomalyDetector initialization in main(), wire anomaly event
broadcasts to dashboard WS as alert messages, and verify all security
mode endpoints (arm/disarm/status) return correct JSON with persistent
state across restarts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 14:47:44 -04:00
Argo Workflows CI
72f702ae7a ci: auto-bump version to 0.1.73
Some checks failed
E2E Tests / End-to-End Integration Tests (push) Has been cancelled
E2E Tests / Docker E2E Tests (push) Has been cancelled
2026-04-07 18:42:53 +00:00
jedarden
e954f6f78e feat: improve floorplan image upload endpoint logging
- Add error logging for failed file reads
- Add debug logging for uploaded file size
- Remove unused img variable from DecodeConfig
2026-04-07 14:42:35 -04:00
Argo Workflows CI
d11fef8bb6 ci: auto-bump version to 0.1.72 2026-04-07 18:37:21 +00:00
jedarden
bf40673b72 feat: wire anomaly detection & security mode API endpoints
AnomalyDetector is initialized in main() with periodic model updates.
Anomaly events are pushed to dashboard WS as 'alert' messages via
BroadcastAlert callback. Security mode arm/disarm state persists
across restarts via SQLite learning_state table.

Endpoints:
- GET /api/anomalies?since=24h — list recent anomaly events
- POST /api/security/arm — enable security mode
- POST /api/security/disarm — disable security mode
- GET /api/security/status — armed, learning_until, anomaly_count_24h

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 14:36:59 -04:00
Argo Workflows CI
7347a5295b ci: auto-bump version to 0.1.71 2026-04-07 18:20:45 +00:00
jedarden
008d3caa60 feat: fix floorplan table schema and create /data/floorplan directory
- Fix migration 001 floorplan schema: use distance_m instead of cal_distance_m,
  and rotation_deg instead of room_bounds_json
- Update migration 010 to ALTER existing floorplan tables for databases
  that already ran migration 001
- Create /data/floorplan directory in db.OpenDB for storing floor plan images
2026-04-07 14:20:38 -04:00
Argo Workflows CI
f2ff68481c ci: auto-bump version to 0.1.70 2026-04-07 18:13:17 +00:00
jedarden
04129addd3 feat: implement Zones CRUD REST endpoints with OpenAPI docs
- GET/POST /api/zones - list and create zones with JSON responses
- PUT /api/zones/{id} - update existing zone
- DELETE /api/zones/{id} - delete zone
- All endpoints return JSON with proper HTTP status codes
- OpenAPI/Swagger annotations present (@Summary, @Description, @Tags, @Router, etc.)
- Table-driven tests for all CRUD operations

Acceptance criteria met:
- Endpoints respond correctly to HTTP requests
- Godoc annotations present for API documentation
2026-04-07 14:13:08 -04:00
jedarden
adf01975a0 fix: correct config field name from InstallSecretHex to InstallSecret
The provisioning.NewServer call was using cfg.InstallSecretHex which
doesn't exist. The correct field name in config.Config is InstallSecret.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 14:13:08 -04:00
Argo Workflows CI
539d436e31 ci: auto-bump version to 0.1.69 2026-04-07 17:38:21 +00:00
jedarden
98c43b3734 feat: wire NTP client into firmware build and initialization 2026-04-07 13:38:13 -04:00
jedarden
8a809fee2f feat: wire anomaly detection & security mode API endpoints
All acceptance criteria verified:
- AnomalyDetector initialized in main() with providers wired
- Anomaly events pushed to dashboard WS feed as 'alert' messages
- GET /api/anomalies?since=24h returns active + history anomalies
- POST /api/security/arm + /api/security/disarm endpoints
- GET /api/security/status with armed, learning_until, anomaly_count_24h
- Security mode persists across restarts via learning_state table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:38:13 -04:00
Argo Workflows CI
9d6808de01 ci: auto-bump version to 0.1.68 2026-04-07 17:32:31 +00:00
jedarden
c256a02490 feat: wire NTP client into firmware build and initialization
Firmware (already implemented):
- ntp.c: Call esp_sntp_setservername() before esp_sntp_init()
- ntp.c: 10-minute periodic resync via esp_timer
- main.c: Read ntp_server from NVS (default: pool.ntp.org)
- main.c: 10-second sync attempt after WiFi connect with WARN on failure
- websocket.c: Include ntp_synced status in health JSON

Mothership (added):
- message.go: Add NTPSynced field to HealthMessage struct
- message.go: Add NTPServer field to ConfigMessage struct
- server.go: Add SendNTPServerToMAC() method for runtime NTP config
- server.go: Update sendConfig() to accept NTP server parameter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:32:24 -04:00
jedarden
733b30f0bd feat: wire load-shedding level to health endpoint and dashboard WS alerts
- Rename health endpoint JSON field from 'load_level' to 'shedding_level'
- Add GetShedLevel callback to health checker for direct ProcessorManager access
- Dashboard WebSocket alerts now broadcast on Level 3 trigger and recovery
- Level 3 actively pushes 10Hz rate cap to all connected nodes
- Recovery from Level 3 restores adaptive rate control automatically

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:32:24 -04:00
Argo Workflows CI
4b92fac7f2 ci: auto-bump version to 0.1.67 2026-04-07 17:21:20 +00:00
jedarden
f851ede69e feat: wire anomaly detection & security mode API endpoints
- AnomalyDetector initialized and running in main() with periodic updates
- Anomaly events pushed to dashboard WS feed as 'alert' messages
- GET /api/anomalies?since=24h lists recent anomaly events
- POST /api/security/arm + /api/security/disarm endpoints
- GET /api/security/status returns armed, learning_until, anomaly_count_24h
- Arm/disarm state persists via learning_state SQLite table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:20:54 -04:00
Argo Workflows CI
0f8645332a ci: auto-bump version to 0.1.66 2026-04-07 17:19:31 +00:00
jedarden
a42c5e7ea1 feat: wire NTP client into firmware build and initialization
- Add ntp.c to CMakeLists.txt SRCS so it's compiled and linked
- Load ntp_server from NVS in load_nvs_config() (default: pool.ntp.org)
- Add ntp_server field to spaxel_state_t
- Initialize NTP after WiFi connects with 10s sync timeout, WARN on failure
- Re-sync NTP after WiFi reconnect (WIFI_LOST state)
- Start periodic 10-minute resync timer via esp_timer
- Add ntp_synced boolean to health JSON message
- Handle ntp_server field in downstream config message
- Fix periodic resync callback to properly stop/restart SNTP

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:19:07 -04:00
Argo Workflows CI
83a86faee4 ci: auto-bump version to 0.1.65 2026-04-07 17:15:18 +00:00
jedarden
c44065e927 feat: wire load-shedding level to health endpoint and dashboard WS alerts
- Connect pm.GetShedLevel to health checker (exposes load_level in /healthz)
- Wire OnShedLevelChange callback to broadcast dashboard WS alert on Level 3
- Log rate reduction push and recovery messages for Level 3 transitions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:14:54 -04:00
jedarden
6263ce1554 feat: register Zones CRUD REST API endpoints
- Register zones and portals API handler in main.go
- GET /api/zones - list all zones with occupancy
- POST /api/zones - create a new zone
- PUT /api/zones/{id} - update existing zone
- DELETE /api/zones/{id} - delete a zone
- OpenAPI-style godoc comments already in place
- Zone changes reflect in live 3D view within one WebSocket cycle (10 Hz polling)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 13:14:54 -04:00
Argo Workflows CI
139f5954f4 ci: auto-bump version to 0.1.64 2026-04-07 16:53:25 +00:00
jedarden
a9fa6f6f25 feat: add per-iteration timing and load-shedding to ProcessorManager
Add a 5-iteration rolling average timer to Process() with automatic
load-shedding levels (0-3) based on pipeline duration thresholds
(80ms/90ms/95ms). Recovery steps down when avg drops below 60ms for
10 consecutive iterations. Includes GetShedLevel() getter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:53:06 -04:00
Argo Workflows CI
57c27de729 ci: auto-bump version to 0.1.63 2026-04-07 16:52:06 +00:00
jedarden
41d6d09561 feat: implement internal pub/sub event bus
- Add TimestampMs field to eventbus.Event
- Add event type constants (detection, zone_entry, zone_exit, etc.)
- Add severity level constants
- Add global Default() bus instance for shared access
- Add convenience functions: PublishDefault, PublishDefaultSync, SubscribeDefault
- Integrate with events.InsertEvent to publish to eventbus
- Add comprehensive table-driven tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:51:41 -04:00
Argo Workflows CI
3db48cd61d ci: auto-bump version to 0.1.62 2026-04-07 16:40:33 +00:00
jedarden
76ac2710c9 feat: startup phase sequencing with 30s timeout enforcement
Implement explicit 7-phase startup logging and timeout enforcement:
- Phases 1-4 (data dir, SQLite, migrations, secrets) in db.OpenDB
- Phase 5 (subsystems) with 5s per-subsystem timeout via SubsystemStart
- Phase 6 (HTTP + mDNS) and Phase 7 (health check + ready file)
- FatalFunc injection for testable timeout handling
- Each phase logs [PHASE N/7 — Description] on start, [PHASE N/7 OK] (Xms) on completion
- 30s total startup deadline via context.WithTimeout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:40:25 -04:00
jedarden
984f9ef262 feat: add nightly archive scheduler for events (02:00 local time)
- Add StartArchiveScheduler function that runs RunArchiveJob nightly at 02:00 local time
- Scheduler respects local timezone and gracefully stops on done channel signal
- Add table-driven tests for scheduler start and stop behavior
- Add AnomalyType, SystemMode, and sleep session event types to types.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:40:25 -04:00
jedarden
60a21bacb6 feat: add end-to-end integration test harness
Implements a comprehensive e2e test system that:
- Starts mothership container/binary
- Waits for /healthz with 15s timeout
- Handles PIN auth setup if needed
- Runs CSI simulator against mothership
- Asserts during run (health, nodes online, blob detection)
- Validates frame rate doesn't drop >20%
- Asserts detection events recorded

Components added:
- mothership/cmd/sim: CSI simulator that generates synthetic frames
- mothership/tests/e2e: Go test suite with WebSocket assertions
- tests/e2e/run.sh: Shell script with comprehensive assertions
- .github/workflows/e2e.yml: CI workflow for automated testing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:40:25 -04:00
Argo Workflows CI
9810da2ee6 ci: auto-bump version to 0.1.61 2026-04-07 16:34:52 +00:00
jedarden
ff3428fee6 feat: robust WebSocket reconnection with backoff, extrapolation, and visual states
Implements exponential backoff (1s→10s cap) with ±500ms jitter,
blob position extrapolation during disconnects (capped at 2s),
three visual states (silent <5s, dimming 5-30s, modal >30s),
and automatic scene restoration on reconnect.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:34:35 -04:00
jedarden
31659a5ccc fix: resolve TestTimeoutDoesNotDisable hang in trigger tests
Replace httptest.Server with raw net.Listen to avoid Close() blocking
on active connections after a timeout, which caused the test to hang
indefinitely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:34:35 -04:00
Argo Workflows CI
f951c029a3 ci: auto-bump version to 0.1.60 2026-04-07 16:22:24 +00:00
jedarden
d41cfe3e4f feat: add CSI frame validation with DEBUG logging and performance benchmark
Implement strict CSI binary frame validation with per-connection malformed
frame counters and automatic connection closure on persistent malformed input.

Validation rules implemented:
- Minimum frame length: 24 bytes (header only)
- Maximum frame length: 280 bytes (24 header + 128 subcarriers × 2 bytes)
- n_sub field: must be ≤128
- Payload length: must equal n_sub × 2 bytes exactly
- channel: must be in [1,14] for 2.4 GHz; drop if 0 or >14
- rssi: 0 treated as invalid/missing (logged at DEBUG, but frame allowed)
- timestamp_us: any uint64 value accepted

Per-connection malformed counter (sliding 60-second window):
- On each validation failure: increment malformed_count; log at DEBUG
- If malformed_count > 100 within 60s: log WARN
- If malformed_count > 1000 within 60s: close WebSocket with message
  'Excessive malformed frames — possible firmware bug'
- Counter resets every 60s

Acceptance criteria met:
- Valid frame: passes all checks in < 1 μs (benchmark test added)
- Frame with n_sub=200: rejected (n_sub > 128)
- Frame with len=10: rejected (< 24 bytes)
- Frame with channel=0: rejected with DEBUG log
- 1001 malformed frames in 60s: connection closed with correct message
- 101 malformed frames: WARN logged, connection kept open
- RSSI=0: allowed but logged at DEBUG for AGC skip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:22:17 -04:00
Argo Workflows CI
08394fc90f ci: auto-bump version to 0.1.59 2026-04-07 16:15:13 +00:00
jedarden
da116c546b feat: add environment variable validation with documented defaults
- Create internal/config package with Load() function for all env vars
- Validate types (string, bool, int, enum, URL) and ranges
- Collect all validation errors before returning (fail fast)
- Log non-sensitive values at INFO on startup (MQTT_PASSWORD masked)
- Return error slice; main() logs each error and exits(1)
- Unit tests for valid/invalid cases

Env vars validated:
- SPAXEL_BIND_ADDR (string, default '0.0.0.0:8080')
- SPAXEL_DATA_DIR (string, default '/data')
- SPAXEL_STATIC_DIR (string, default '/dashboard')
- SPAXEL_MDNS_ENABLED (bool, default true)
- SPAXEL_MDNS_NAME (string, default 'spaxel')
- SPAXEL_LOG_LEVEL (enum: debug|info|warn|error, default 'info')
- SPAXEL_FUSION_RATE_HZ (int, range [1,20], default 10)
- SPAXEL_REPLAY_MAX_MB (int, range [10,10000], default 360)
- SPAXEL_INSTALL_SECRET (string, optional, 32+ chars if set)
- SPAXEL_NTP_SERVER (string, default 'pool.ntp.org')
- SPAXEL_MQTT_BROKER (string, optional, must be valid URL if set)
- SPAXEL_MQTT_USERNAME (string, optional)
- SPAXEL_MQTT_PASSWORD (string, optional, never logged)
- TZ (string, default 'UTC', validated via time.LoadLocation)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:15:06 -04:00
Argo Workflows CI
529d6108d3 ci: auto-bump version to 0.1.58 2026-04-07 16:10:33 +00:00
jedarden
0377426926 fix: wire anomaly detection & security mode API endpoints
- Add missing CountAnomaliesSince method to mockDetectorProvider
  in security_test.go to satisfy the DetectorProvider interface
- Fix variable shadowing bug in anomaly.go QueryAnomalyEvents
  where incomplete rename from 'events' to 'result' caused
  append(events, &e) to reference the package instead of the slice

All security mode endpoints verified:
- GET /api/anomalies?since=24h — lists recent anomaly events
- POST /api/security/arm + /api/security/disarm — arm/disarm
- GET /api/security/status — {armed, learning_until, anomaly_count_24h}
- Anomaly events push to dashboard WS as 'alert' messages
- Arm/disarm state persists across restarts via learning_state table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 12:10:01 -04:00
Argo Workflows CI
c15e03a9d2 ci: auto-bump version to 0.1.57 2026-04-07 15:52:03 +00:00
jedarden
001c17bd85 feat: implement 10-step SIGTERM graceful shutdown sequence
Implements the full ordered shutdown sequence so the mothership drains
cleanly without data loss on SIGTERM (Docker stop, Kubernetes termination).

Shutdown sequence (30s hard deadline):
1. Set shutting_down=true; ingestion server returns HTTP 503 to new WebSocket upgrade requests
2. Broadcast {type:'shutdown', reconnect_in_ms:30000} to all dashboard WebSocket clients
3. Cancel fusion loop context (stops fusion goroutine)
4. Drain signal processing pipeline: wait for in-flight CSI frames (max 2s)
5. Flush in-memory baselines to SQLite in a single transaction
6. Sync CSI recording buffer to disk (close writer, fsync)
7. Close all node WebSocket connections with normal close frame (1000)
8. Write {type:'system', description:'Mothership stopped'} event to events table
9. PRAGMA wal_checkpoint(FULL) to collapse WAL into main DB file
10. sqlite3.Close()

Each step gets its own log line: '[SHUTDOWN] Step N/10 — ...'
Steps that fail log ERROR but do not abort remaining steps.
Exit code 0 if all steps completed within deadline; exit code 1 if deadline exceeded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 11:51:38 -04:00
Argo Workflows CI
1305890f49 ci: auto-bump version to 0.1.56 2026-04-07 15:39:03 +00:00
jedarden
5db3110a2a feat: implement automation triggers CRUD REST endpoints
Add full CRUD endpoints for triggers with OpenAPI-style godoc comments:
- GET/POST /api/triggers (list all, create new)
- PUT/DELETE /api/triggers/{id} (update, delete)
- POST /api/triggers/{id}/test (fire trigger once for testing)

Both TriggersHandler (simple) and VolumeTriggersHandler (3D geometry)
implement all endpoints with table-driven tests covering validation,
persistence, and round-trip lifecycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 11:38:34 -04:00
Argo Workflows CI
173490ba98 ci: auto-bump version to 0.1.55 2026-04-07 15:09:47 +00:00
jedarden
e44dd345f6 feat: implement comprehensive /healthz endpoint
Add complete health check implementation for Docker HEALTHCHECK and
Traefik health routing with:

Response fields:
- status: "ok" or "degraded"
- uptime_s: seconds since mothership boot
- version: mothership version string
- nodes_online: count of connected nodes
- db: "ok" or "failing" (SELECT 1 with 100ms timeout)
- load_level: 0-3 from load shedding state
- reason: human-readable explanation (only when degraded)

HTTP status codes:
- 200 for healthy (status="ok")
- 503 for degraded (status="degraded")

Degraded conditions:
- Database unreachable
- Load level 3 sustained for >60 seconds
- No nodes connected after 5 minutes uptime

Docker HEALTHCHECK updated to verify status="ok" response.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 11:09:36 -04:00