AnomalyDetector is initialized in main() with periodic model updates.
Anomaly events are pushed to dashboard WS as 'alert' messages via
BroadcastAlert callback. Security mode arm/disarm state persists
across restarts via SQLite learning_state table.
Endpoints:
- GET /api/anomalies?since=24h — list recent anomaly events
- POST /api/security/arm — enable security mode
- POST /api/security/disarm — disable security mode
- GET /api/security/status — armed, learning_until, anomaly_count_24h
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix migration 001 floorplan schema: use distance_m instead of cal_distance_m,
and rotation_deg instead of room_bounds_json
- Update migration 010 to ALTER existing floorplan tables for databases
that already ran migration 001
- Create /data/floorplan directory in db.OpenDB for storing floor plan images
- GET/POST /api/zones - list and create zones with JSON responses
- PUT /api/zones/{id} - update existing zone
- DELETE /api/zones/{id} - delete zone
- All endpoints return JSON with proper HTTP status codes
- OpenAPI/Swagger annotations present (@Summary, @Description, @Tags, @Router, etc.)
- Table-driven tests for all CRUD operations
Acceptance criteria met:
- Endpoints respond correctly to HTTP requests
- Godoc annotations present for API documentation
The provisioning.NewServer call was using cfg.InstallSecretHex which
doesn't exist. The correct field name in config.Config is InstallSecret.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All acceptance criteria verified:
- AnomalyDetector initialized in main() with providers wired
- Anomaly events pushed to dashboard WS feed as 'alert' messages
- GET /api/anomalies?since=24h returns active + history anomalies
- POST /api/security/arm + /api/security/disarm endpoints
- GET /api/security/status with armed, learning_until, anomaly_count_24h
- Security mode persists across restarts via learning_state table
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Firmware (already implemented):
- ntp.c: Call esp_sntp_setservername() before esp_sntp_init()
- ntp.c: 10-minute periodic resync via esp_timer
- main.c: Read ntp_server from NVS (default: pool.ntp.org)
- main.c: 10-second sync attempt after WiFi connect with WARN on failure
- websocket.c: Include ntp_synced status in health JSON
Mothership (added):
- message.go: Add NTPSynced field to HealthMessage struct
- message.go: Add NTPServer field to ConfigMessage struct
- server.go: Add SendNTPServerToMAC() method for runtime NTP config
- server.go: Update sendConfig() to accept NTP server parameter
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename health endpoint JSON field from 'load_level' to 'shedding_level'
- Add GetShedLevel callback to health checker for direct ProcessorManager access
- Dashboard WebSocket alerts now broadcast on Level 3 trigger and recovery
- Level 3 actively pushes 10Hz rate cap to all connected nodes
- Recovery from Level 3 restores adaptive rate control automatically
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- AnomalyDetector initialized and running in main() with periodic updates
- Anomaly events pushed to dashboard WS feed as 'alert' messages
- GET /api/anomalies?since=24h lists recent anomaly events
- POST /api/security/arm + /api/security/disarm endpoints
- GET /api/security/status returns armed, learning_until, anomaly_count_24h
- Arm/disarm state persists via learning_state SQLite table
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ntp.c to CMakeLists.txt SRCS so it's compiled and linked
- Load ntp_server from NVS in load_nvs_config() (default: pool.ntp.org)
- Add ntp_server field to spaxel_state_t
- Initialize NTP after WiFi connects with 10s sync timeout, WARN on failure
- Re-sync NTP after WiFi reconnect (WIFI_LOST state)
- Start periodic 10-minute resync timer via esp_timer
- Add ntp_synced boolean to health JSON message
- Handle ntp_server field in downstream config message
- Fix periodic resync callback to properly stop/restart SNTP
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Connect pm.GetShedLevel to health checker (exposes load_level in /healthz)
- Wire OnShedLevelChange callback to broadcast dashboard WS alert on Level 3
- Log rate reduction push and recovery messages for Level 3 transitions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Register zones and portals API handler in main.go
- GET /api/zones - list all zones with occupancy
- POST /api/zones - create a new zone
- PUT /api/zones/{id} - update existing zone
- DELETE /api/zones/{id} - delete a zone
- OpenAPI-style godoc comments already in place
- Zone changes reflect in live 3D view within one WebSocket cycle (10 Hz polling)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a 5-iteration rolling average timer to Process() with automatic
load-shedding levels (0-3) based on pipeline duration thresholds
(80ms/90ms/95ms). Recovery steps down when avg drops below 60ms for
10 consecutive iterations. Includes GetShedLevel() getter.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add TimestampMs field to eventbus.Event
- Add event type constants (detection, zone_entry, zone_exit, etc.)
- Add severity level constants
- Add global Default() bus instance for shared access
- Add convenience functions: PublishDefault, PublishDefaultSync, SubscribeDefault
- Integrate with events.InsertEvent to publish to eventbus
- Add comprehensive table-driven tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add StartArchiveScheduler function that runs RunArchiveJob nightly at 02:00 local time
- Scheduler respects local timezone and gracefully stops on done channel signal
- Add table-driven tests for scheduler start and stop behavior
- Add AnomalyType, SystemMode, and sleep session event types to types.go
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements a comprehensive e2e test system that:
- Starts mothership container/binary
- Waits for /healthz with 15s timeout
- Handles PIN auth setup if needed
- Runs CSI simulator against mothership
- Asserts during run (health, nodes online, blob detection)
- Validates frame rate doesn't drop >20%
- Asserts detection events recorded
Components added:
- mothership/cmd/sim: CSI simulator that generates synthetic frames
- mothership/tests/e2e: Go test suite with WebSocket assertions
- tests/e2e/run.sh: Shell script with comprehensive assertions
- .github/workflows/e2e.yml: CI workflow for automated testing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements exponential backoff (1s→10s cap) with ±500ms jitter,
blob position extrapolation during disconnects (capped at 2s),
three visual states (silent <5s, dimming 5-30s, modal >30s),
and automatic scene restoration on reconnect.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace httptest.Server with raw net.Listen to avoid Close() blocking
on active connections after a timeout, which caused the test to hang
indefinitely.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement strict CSI binary frame validation with per-connection malformed
frame counters and automatic connection closure on persistent malformed input.
Validation rules implemented:
- Minimum frame length: 24 bytes (header only)
- Maximum frame length: 280 bytes (24 header + 128 subcarriers × 2 bytes)
- n_sub field: must be ≤128
- Payload length: must equal n_sub × 2 bytes exactly
- channel: must be in [1,14] for 2.4 GHz; drop if 0 or >14
- rssi: 0 treated as invalid/missing (logged at DEBUG, but frame allowed)
- timestamp_us: any uint64 value accepted
Per-connection malformed counter (sliding 60-second window):
- On each validation failure: increment malformed_count; log at DEBUG
- If malformed_count > 100 within 60s: log WARN
- If malformed_count > 1000 within 60s: close WebSocket with message
'Excessive malformed frames — possible firmware bug'
- Counter resets every 60s
Acceptance criteria met:
- Valid frame: passes all checks in < 1 μs (benchmark test added)
- Frame with n_sub=200: rejected (n_sub > 128)
- Frame with len=10: rejected (< 24 bytes)
- Frame with channel=0: rejected with DEBUG log
- 1001 malformed frames in 60s: connection closed with correct message
- 101 malformed frames: WARN logged, connection kept open
- RSSI=0: allowed but logged at DEBUG for AGC skip
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add missing CountAnomaliesSince method to mockDetectorProvider
in security_test.go to satisfy the DetectorProvider interface
- Fix variable shadowing bug in anomaly.go QueryAnomalyEvents
where incomplete rename from 'events' to 'result' caused
append(events, &e) to reference the package instead of the slice
All security mode endpoints verified:
- GET /api/anomalies?since=24h — lists recent anomaly events
- POST /api/security/arm + /api/security/disarm — arm/disarm
- GET /api/security/status — {armed, learning_until, anomaly_count_24h}
- Anomaly events push to dashboard WS as 'alert' messages
- Arm/disarm state persists across restarts via learning_state table
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the full ordered shutdown sequence so the mothership drains
cleanly without data loss on SIGTERM (Docker stop, Kubernetes termination).
Shutdown sequence (30s hard deadline):
1. Set shutting_down=true; ingestion server returns HTTP 503 to new WebSocket upgrade requests
2. Broadcast {type:'shutdown', reconnect_in_ms:30000} to all dashboard WebSocket clients
3. Cancel fusion loop context (stops fusion goroutine)
4. Drain signal processing pipeline: wait for in-flight CSI frames (max 2s)
5. Flush in-memory baselines to SQLite in a single transaction
6. Sync CSI recording buffer to disk (close writer, fsync)
7. Close all node WebSocket connections with normal close frame (1000)
8. Write {type:'system', description:'Mothership stopped'} event to events table
9. PRAGMA wal_checkpoint(FULL) to collapse WAL into main DB file
10. sqlite3.Close()
Each step gets its own log line: '[SHUTDOWN] Step N/10 — ...'
Steps that fail log ERROR but do not abort remaining steps.
Exit code 0 if all steps completed within deadline; exit code 1 if deadline exceeded.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add full CRUD endpoints for triggers with OpenAPI-style godoc comments:
- GET/POST /api/triggers (list all, create new)
- PUT/DELETE /api/triggers/{id} (update, delete)
- POST /api/triggers/{id}/test (fire trigger once for testing)
Both TriggersHandler (simple) and VolumeTriggersHandler (3D geometry)
implement all endpoints with table-driven tests covering validation,
persistence, and round-trip lifecycle.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add complete health check implementation for Docker HEALTHCHECK and
Traefik health routing with:
Response fields:
- status: "ok" or "degraded"
- uptime_s: seconds since mothership boot
- version: mothership version string
- nodes_online: count of connected nodes
- db: "ok" or "failing" (SELECT 1 with 100ms timeout)
- load_level: 0-3 from load shedding state
- reason: human-readable explanation (only when degraded)
HTTP status codes:
- 200 for healthy (status="ok")
- 503 for degraded (status="degraded")
Degraded conditions:
- Database unreachable
- Load level 3 sustained for >60 seconds
- No nodes connected after 5 minutes uptime
Docker HEALTHCHECK updated to verify status="ok" response.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Events (zone entries/exits, portal crossings, presence transitions)
were already broadcast immediately via BroadcastEvent, but the
buffered copies included in the 10 Hz delta tick were silently
dropped by handleIncrementalUpdate. Now delta events are processed
through the same handleEventMessage path, with dedup to avoid
double-processing when both immediate and delta copies arrive.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement versioned NVS key migration on ESP32-S3 firmware so
OTA-updated firmware gracefully handles NVS written by older versions.
- Add nvs_migration.c/h with migration framework
- On boot, read schema_ver from NVS; initialize to 1 if missing
- Run migrations sequentially if schema_ver < COMPILED_NVS_VERSION
- Each migration commits after each write for durability
- Log all migration steps to UART for debugging
- Example migration v1→v2: rename 'ms_ip' to 'mothership_ip',
add 'ntp_server' with default 'pool.ntp.org'
- Migration failure leaves NVS in consistent state
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>