All 9 phases implemented, 83 beads closed. Project reached completion status with comprehensive acceptance tests covering all major functionality.
272 KiB
Spaxel — Implementation Plan
Last updated: 2026-05-24 Status: COMPLETE — All 9 phases implemented, 83 beads closed
WiFi CSI-based indoor positioning for self-hosted home environments.
System Overview
A single Docker container ("the mothership") runs on a home server. It manages a fleet of ESP32-S3 devices that transmit and receive WiFi packets, extract Channel State Information, and stream it back. The mothership fuses CSI from all links to detect and localize people as spatial blobs, rendered on a floor-plan dashboard.
What Spaxel Can Realistically Achieve
Based on physics and literature (see docs/research/06-accuracy-and-limits.md):
- Presence detection — reliably, with 2+ nodes on opposite sides of a space
- Approximate 2D position — ±0.5–1.0 m with 4+ nodes
- Motion tracking — trajectory following of moving people
- Rough person count — distinguish 1 vs 2+ people (degrades at 3+)
- Rough Z-axis — ±1–2 m with mixed-height node placement
- Stationary person detection — via breathing micro-motion (0.1–0.5 Hz), requires stable setup
Not achievable: sub-10 cm accuracy, skeletal pose, reliable 5+ person tracking.
Glossary
| Term | Definition |
|---|---|
| CSI | Channel State Information — complex amplitude and phase per WiFi subcarrier extracted by the ESP32 from received 802.11 frames. The raw signal that drives all detection. |
| deltaRMS | Root-mean-square deviation of RSSI-normalized CSI amplitudes from the per-link baseline. Primary motion indicator: ~0.02 empty room, ~0.10 walking. |
| NBVI | Normalized Bandwidth Variance Index — Var(amplitude) / Mean(amplitude)² per subcarrier, used to select the 16 most motion-sensitive subcarriers from the 47 available. |
| Fresnel zone | Ellipsoidal region around a TX→RX link path where reflected signals undergo constructive or destructive interference. Zone 1 is most sensitive to motion. |
| Fusion tick | One iteration of the localization loop, running at 10 Hz (every 100 ms). One grid accumulation + peak extraction + UKF update + WebSocket publish. |
| GDOP | Geometric Dilution of Precision — quantifies how well a set of link geometries can localize a point. Low GDOP = good coverage. >4 = poor, Infinity = no coverage. |
| UKF | Unscented Kalman Filter — per-blob state estimator tracking position [px,py,pz] and velocity [vx,vy,vz] with biomechanical constraints. |
| OTA | Over-the-Air firmware update — ESP-IDF dual-partition scheme where new firmware is downloaded to an inactive partition, verified by SHA-256, and activated on reboot. |
| NVS | Non-Volatile Storage — ESP32 key-value flash partition used to persist WiFi credentials, node ID, HMAC token, and runtime state across reboots. |
| mDNS | Multicast DNS — zero-configuration service discovery on the local LAN. Mothership advertises _spaxel._tcp.local; nodes discover it without manual IP configuration. |
| Blob | A detected spatial presence — position + velocity estimate for a person (or other moving entity) tracked by the UKF. Has a stable ID within a mothership session. |
| Link | A directional TX→RX pair. In a 4-node fleet with all nodes TX/RX, there are N×(N-1) = 12 unidirectional links. Stored in canonical form min(MAC_a):max(MAC_b). |
| Baseline | Per-link per-subcarrier amplitude reference representing the empty-room RF environment. deltaRMS is computed relative to this. Maintained as an EMA with τ=30s. |
| Mothership | The single Docker container running the Go backend — ingestion, pipeline, localization, fleet manager, dashboard server, and all storage. |
| Phase sanitization | Processing step that removes hardware-induced phase artifacts (STO slope, CFO intercept) via spatial phase unwrapping + OLS regression. |
| STO/CFO | Sampling Time Offset / Carrier Frequency Offset — hardware timing and frequency errors that impose a deterministic phase ramp across subcarriers, removed before feature extraction. |
Non-Goals
These are conscious scope decisions, not physics limitations. Each has a rationale.
| Non-Goal | Rationale |
|---|---|
| No embedded MQTT broker | Users already have Home Assistant / Mosquitto. A broker inside the container adds operational complexity (ports, persistence, HA config) with zero detection benefit. Integration layer only. |
| No 5 GHz support | ESP32-S3 is 2.4 GHz hardware. Adding 5 GHz would require different hardware (ESP32-C6) with a different CSI API — a separate hardware platform decision, not an incremental feature. |
| No cloud relay or remote access | Spaxel is intentionally self-hosted. Cloud relay would require a relay server, account management, and TLS termination — all better served by a Tailscale/VPN overlay on the user's side. |
| No camera or audio fallback | CSI-only is the privacy design principle. Adding camera or microphone input would require consent UI, storage policy, and a fundamentally different threat model. |
| No sub-10 cm localization accuracy | Physics limit of 2.4 GHz CSI. Pursuing higher accuracy would require UWB hardware (completely different platform) or dense node deployments (20+ nodes per room). |
| No multi-site / multi-home support | One mothership per physical location. Multi-site coordination requires remote configuration sync and distributed state — a different product tier, not a feature. |
| No building/floor management | Floor plans are per-installation. Multi-floor or multi-building topologies would require coordinate space unification that the current 3D grid does not support cleanly. |
| No user accounts / multi-user auth | A single PIN protects the mothership dashboard. Multi-user auth with roles is out of scope; home deployments have one admin. |
| No real-time WebSocket API to external consumers | The /ws/dashboard feed is for the dashboard UI. External consumers use the REST API. A public WebSocket API would require versioning, auth tokens, and rate limiting not designed here. |
Architecture
┌───────────────────────────────────────────────────────────────────────────┐
│ Docker Container (Mothership) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌────────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Ingestion│ │ Signal │ │ Fusion, │ │ BLE │ │Dashboard │ │
│ │ Server │──│Processing│──│ Localizer &│──│ Identity │──│ (Web UI) │ │
│ │ (WS) │ │ Pipeline │ │ Biomech UKF│ │ Matcher │ │ │ │
│ └──────────┘ └──────────┘ └────────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌────────────┐ ┌───────────────────────┐ │
│ │ Fleet │ │ OTA │ │ Onboarding │ │ Automation Engine │ │
│ │ Manager │ │ Server │ │ (Web Serial│ │ (triggers, fall, │ │
│ └──────────┘ └──────────┘ │ + Captive) │ │ anomaly, prediction, │ │
│ └────────────┘ │ sleep, crowd flow) │ │
│ └───────────────────────┘ │
└───────────────────────────────────────────────────────────────────────────┘
▲ WebSocket /ws/node (binary CSI + JSON config/BLE, bidirectional)
│
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ ESP32-S3│ │ ESP32-S3│ │ ESP32-S3│ │ WiFi AP │
│ (RX) │ │ (TX) │ │ (TX/RX) │ │ (passive│
│ WiFi+BLE│ │ WiFi+BLE│ │ WiFi+BLE│ │ radar) │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Technology Choices
| Component | Technology | Rationale |
|---|---|---|
| Mothership backend | Go | Low-latency ingestion, single binary, easy Docker packaging |
| Dashboard frontend | Vanilla JS + Three.js | No build toolchain; Three.js provides hardware-accelerated 3D scene with orbit controls, raycasting, and transparent rendering — all needed for spatial visualization |
| ESP32 firmware | ESP-IDF (C) | Full CSI API access, OTA support, NVS for config persistence |
| Node ↔ Mothership transport | WebSocket (single bidirectional connection per node) | Binary frames upstream (CSI data), JSON frames downstream (config, role, OTA triggers). Single HTTP port via Traefik. Connection state = node liveness — no separate heartbeat protocol needed |
| OTA delivery | HTTP (served by mothership) | Standard ESP-IDF OTA mechanism, firmware binaries served from container |
| Onboarding | Web Serial API (browser → USB) | Zero-install provisioning from the dashboard |
| BLE identity | ESP32-S3 BLE (passive scan) | Concurrent with WiFi on second core. Scans for phone/watch/tag advertisements. Enables person identification of anonymous CSI blobs |
| Persistence | SQLite (modernc.org/sqlite, pure Go) |
Node registry, BLE device registry, floor plans, calibration data, baseline snapshots, CSI recording buffer, Fresnel weights, prediction models, sleep data |
| HA integration | MQTT client (optional) — github.com/eclipse/paho.mqtt.golang |
Mothership connects as client to user's existing MQTT broker for Home Assistant auto-discovery. No broker runs inside the container |
| Go WebSocket (server) | github.com/gorilla/websocket |
Mature, widely used. Supports SetReadDeadline, binary frame send, and ping/pong handler registration. conn.SetPingHandler() used for pong tracking |
| HTTP routing | net/http stdlib + github.com/go-chi/chi |
Chi provides URL parameters (:mac, :id) without adding a framework. Lightweight, compatible with stdlib handlers |
| Matrix ops (UKF) | gonum.org/v1/gonum/mat |
Standard Go scientific computing library for UKF sigma point matrix operations |
| Notification image render | github.com/fogleman/gg |
Pure Go 2D drawing — no cgo required |
| mDNS | github.com/hashicorp/mdns |
Pure Go, no OS daemon dependency |
| HMAC/auth | crypto/hmac, golang.org/x/crypto/bcrypt |
Standard library HMAC; bcrypt for PIN hashing |
Component Design
1. ESP32-S3 Firmware
Single firmware binary that runs on every node. Behavior (TX, RX, or both) is determined by config from the mothership.
Core responsibilities:
- Connect to configured WiFi network
- Discover mothership via mDNS (
_spaxel._tcp.local) — no manual IP configuration required. Falls back to NVS-stored IP if mDNS fails - Open a single WebSocket connection to mothership (
ws://mothership:8080/ws/node). This connection carries all communication in both directions — CSI data upstream, config/commands downstream
Node connection lifecycle (state machine):
BOOT
└─▶ WiFi connect (exponential backoff: 1s, 2s, 4s, 8s, 16s, 30s, 30s steady)
│ WiFi connected
▼
MOTHERSHIP_DISCOVERY
1. Query mdns_query_srv("_spaxel", "_tcp", 5000ms timeout) → get host + port
2. If mDNS fails: use cached ms_ip NVS key if set
3. Attempt WebSocket connect to resolved address (5 s timeout)
4. On success: update ms_ip NVS key with current IP, go to CONNECTED
5. On fail: retry discovery (retry 1→2→1→2... cycling, 5 s between attempts)
6. After 10 consecutive discovery failures: go to MOTHERSHIP_UNAVAILABLE
│ WebSocket connected
▼
CONNECTED — normal operation (CSI streaming, BLE scanning, health reporting)
- On WebSocket disconnect: go to MOTHERSHIP_DISCOVERY (WiFi may still be fine)
- On WiFi disconnect: go to WIFI_LOST
│
WIFI_LOST — WiFi reconnect loop (exponential backoff, same as above)
- If WiFi reconnects: go to MOTHERSHIP_DISCOVERY
- After 10 consecutive WiFi failures: go to CAPTIVE_PORTAL
│
MOTHERSHIP_UNAVAILABLE — mothership is unreachable but WiFi is fine
- Continue operating at last known role (TX/RX/passive) — CSI is not streamed, just discarded
- BLE scanning continues (results queued locally, max 60 entries)
- Retry mothership discovery every 30 s indefinitely
- Dashboard shows node as STALE (not OFFLINE — WiFi is still up)
- NEVER trigger captive portal — the mothership may simply be rebooting
- On mothership reconnect: deliver queued BLE results, resume normal operation
│
CAPTIVE_PORTAL — WiFi credentials invalid or network gone
- Start AP: "spaxel-XXXX" (last 4 of MAC)
- Serve config page at 192.168.4.1 for new WiFi SSID/password + optional mothership IP
- On credentials saved: write to NVS, reboot → BOOT
Key invariants:
- Captive portal ONLY triggers on WiFi failure, never on mothership unreachability
- The node is always operational (at last known role) even when disconnected from the mothership
- ms_ip NVS key is auto-updated on every successful connection, making the fallback self-healing
- mDNS and direct IP are both tried on every reconnect (mDNS first, cached IP second)
- Send a registration JSON message on connect:
{type: "hello", mac, firmware_version, capabilities} - Listen for role assignment (TX, RX, TX/RX, PASSIVE) and packet rate config as JSON messages from mothership on the same WebSocket
- TX mode: Send probe packets at configured rate (default 20 Hz)
- RX mode: Enable promiscuous mode, capture CSI from all TX nodes, stream raw I/Q pairs as WebSocket binary frames
- Passive mode: RX-only, filtering for the home WiFi AP's BSSID — uses existing router beacon/data frames as the TX source (see Passive Radar Mode)
- TX/RX mode: Alternate between transmitting and receiving on a time-division schedule
- BLE scanning (second core): Continuously scan for BLE advertisements (iBeacon, Eddystone, generic GAP) on the ESP32-S3's second core. Report per-device RSSI as periodic JSON:
{type: "ble", devices: [{addr: "AA:BB:...", rssi: -62, name: "iPhone"}, ...]}. Scanning runs concurrently with WiFi CSI — the ESP32-S3's dual-core architecture handles both without contention - Adaptive sensing rate: Support mothership-controlled packet rate changes. Also perform on-device amplitude variance check at low rate (2 Hz) — if local variance exceeds threshold, burst to full rate and notify mothership
- Report health metrics (free heap, WiFi RSSI, uptime, temperature) as periodic JSON messages on the WebSocket (every 10 s)
- Support OTA firmware updates triggered by mothership command on the WebSocket, pulled via HTTP
- Store config in NVS: WiFi credentials, mothership IP (fallback), node ID, last known role
NVS Layout:
All keys live in NVS namespace "spaxel" (key names max 15 characters per ESP-IDF limit). All multi-key updates call nvs_commit() after every individual write AND once after the full batch — ensuring each key is durable even if power is lost mid-write. A "provisioned" flag is written last; firmware checks it on boot to determine state.
| NVS Key | Type | Max | Default | Written By | Description |
|---|---|---|---|---|---|
schema_ver |
uint8 |
1 B | 1 | firmware | NVS schema version; firmware migrates if found version < current |
provisioned |
uint8 |
1 B | 0 | provisioning | 0 = not provisioned (captive portal); 1 = provisioned |
wifi_ssid |
str |
32 B | — | provisioning/captive | WiFi network SSID |
wifi_pass |
str |
64 B | — | provisioning/captive | WiFi passphrase |
node_id |
str |
37 B | — | provisioning | UUID4 string assigned by mothership |
node_token |
str |
65 B | — | provisioning | 64-char hex HMAC-SHA256 token for WebSocket auth |
ms_mdns |
str |
64 B | "spaxel" |
provisioning | mDNS service name; full: <ms_mdns>._spaxel._tcp.local |
ms_ip |
str |
46 B | — | runtime/captive | Fallback mothership IP (set by captive portal or runtime push); used if mDNS fails |
ms_port |
uint16 |
2 B | 8080 | provisioning | Mothership HTTP port |
passive_bss |
blob |
6 B | — | runtime | AP BSSID bytes for passive radar mode; all-zeros = disabled |
role |
uint8 |
1 B | 2 (TX_RX) | runtime | Last assigned role: 0=TX, 1=RX, 2=TX_RX, 3=PASSIVE, 4=IDLE |
pkt_rate |
uint8 |
1 B | 20 | runtime | Current packet rate Hz |
ap_mode |
uint8 |
1 B | 0 | firmware | 1 = force captive portal on next boot |
debug |
uint8 |
1 B | 0 | provisioning | 1 = verbose USB serial logging |
NVS write sequence during provisioning (Web Serial → esptool-js → NVS):
- Erase namespace
"spaxel"(clean slate) - Write
schema_ver,wifi_ssid,wifi_pass,node_id,node_token,ms_mdns,ms_port,debug— each followed bynvs_commit() - Write
provisioned= 1 last — only set once all other keys are durable - Final
nvs_commit()
Schema migration: On boot, firmware reads schema_ver. If less than the compiled-in version, firmware runs migration code (add/rename/remove keys), then updates schema_ver. Ensures OTA-updated firmware handles NVS written by older versions.
CSI packet format (WebSocket binary frame, upstream):
Each CSI sample is sent as a single WebSocket binary frame on the node's persistent connection. All multi-byte integers are little-endian.
Header (fixed 24 bytes):
node_mac: 6 bytes — source node MAC (6 uint8)
peer_mac: 6 bytes — transmitting node MAC in RX mode; own MAC in TX mode (6 uint8)
timestamp_us: 8 bytes — microseconds since node boot, from esp_timer_get_time() (uint64, little-endian)
Never wraps in practice (~580,000 year overflow). Monotonic since last boot.
Resets to near-zero on reboot — mothership detects reboot by checking if the
new timestamp is significantly less than the previous one on the same connection.
rssi: 1 byte — signed RSSI in dBm (int8)
noise_floor: 1 byte — signed noise floor in dBm (int8)
channel: 1 byte — WiFi channel number (uint8)
n_sub: 1 byte — number of subcarriers in this frame (uint8, typically 64)
Payload (n_sub * 2 bytes):
Per subcarrier: int8 I, int8 Q (in-phase and quadrature, subcarrier index 0..n_sub-1)
Timestamp semantics:
- Node timestamps are used for: (a) inter-packet interval computation within a localization window, (b) phase synchronization across links from the same TX burst.
- Mothership receive time (
time.Now().UnixNano()) is stored alongside each CSI frame in the ring buffer and replay store. This is the authoritative time for replay, timeline events, and baseline timestamps. Node timestamp is stored as a secondary field for phase-synchronization use only. - Inter-link synchronization tolerance: up to 50 ms clock skew between nodes is acceptable (the localization algorithm uses the Fresnel zone geometry, not precise time differences between nodes).
The binary format uses WebSocket binary frames (opcode 0x2) to avoid base64/JSON encoding overhead. Frame size: 24 + n_sub×2 bytes = 152 bytes for 64 subcarriers.
Recovery layers (per docs/notes/recovery-mechanisms.md):
- Automatic: WiFi reconnect loop with exponential backoff; OTA rollback on boot failure (two-partition scheme)
- Captive portal: After 10 failed WiFi attempts, start AP mode as
spaxel-XXXX, serve config page for new WiFi credentials / mothership IP - Web Serial: Dashboard provides browser-based flashing via
esptool-jsfor full recovery - USB fallback: Standard
esptool.pyfor manufacturing / batch flashing
2. Ingestion Server
WebSocket endpoint at /ws/node on the mothership's HTTP port. Each ESP32 node maintains a single persistent bidirectional connection.
Upstream (node → mothership):
- Binary frames: CSI samples (parsed into
(link_id, timestamp, csi_vector)tuples) - JSON frames: registration (
hello), health metrics, OTA status reports, BLE scan results
Downstream (mothership → node):
- JSON frames: role assignment, packet rate config (including adaptive rate commands), OTA commands, reboot commands, identify (blink LED)
Node↔Mothership JSON Message Schemas:
All JSON messages include a "type" discriminator. Unknown type values are silently ignored by the receiver (forward-compatible). MAC addresses are uppercase colon-separated hex ("AA:BB:CC:DD:EE:FF"). Timestamps are Unix milliseconds (uint64). Field naming is snake_case throughout. Maximum JSON frame size: 4 KB (ESP32 heap constraint).
WebSocket keepalive (ping/pong):
- Mothership → Node: The mothership sends WebSocket ping frames every 30 s on each node connection. Read deadline is set to 60 s (= 2× ping interval). If no data (including pong) is received within the read deadline, the connection is closed and the node is marked OFFLINE.
- Implementation:
conn.SetReadDeadline(time.Now().Add(60 * time.Second))reset on every received frame (data or pong). A goroutine sends pings on a 30 s ticker.
- Implementation:
- Node → Mothership: ESP32
esp_websocket_clienthas a built-in keepalive option:config.ping_interval_sec = 30. The ESP32 sends ping frames autonomously; the mothership replies with pong (handled by the Go WebSocket library automatically). - Mothership → Dashboard: Dashboard WebSocket connections use the same 30 s ping / 60 s read deadline. The browser WebSocket API responds to pings automatically.
- Purpose: Keeps NAT state tables alive (typical residential NAT timeout = 60–120 s); detects silently-dropped connections (e.g., WiFi power management dropping packets) within 60 s.
// UPSTREAM: hello — first message on every connect
{"type":"hello","mac":"AA:BB:CC:DD:EE:FF","node_id":"f47ac10b-...","firmware_version":"1.2.3",
"capabilities":["csi","ble","tx","rx"],"chip":"ESP32-S3","flash_mb":16,"uptime_ms":4200}
// UPSTREAM: health — every 10 s
{"type":"health","mac":"AA:BB:CC:DD:EE:FF","timestamp_ms":1711234567890,
"free_heap_bytes":204800,"wifi_rssi_dbm":-52,"uptime_ms":3600000,
"temperature_c":42.1,"csi_rate_hz":20,"wifi_channel":6,"ip":"192.168.1.123"}
// UPSTREAM: ble — every 5 s
{"type":"ble","mac":"AA:BB:CC:DD:EE:FF","timestamp_ms":1711234567890,
"devices":[{"addr":"AA:BB:CC:DD:EE:FF","addr_type":"public","rssi_dbm":-62,
"name":"iPhone","mfr_id":76,"mfr_data_hex":"0215..."}]}
// UPSTREAM: motion_hint — when on-device variance exceeds threshold
{"type":"motion_hint","mac":"AA:BB:CC:DD:EE:FF","timestamp_ms":1711234567890,"variance":0.043}
// UPSTREAM: ota_status — during OTA progress
{"type":"ota_status","mac":"AA:BB:CC:DD:EE:FF","state":"downloading","progress_pct":45}
// state values: "downloading" | "verifying" | "writing" | "rebooting" | "failed"
// "failed" adds: "error":"sha256_mismatch" | "download_failed" | "write_failed"
// DOWNSTREAM: role — assign operational role
{"type":"role","role":"rx"}
// role values: "tx" | "rx" | "tx_rx" | "passive" | "idle"
// passive adds: "passive_bssid":"AA:BB:CC:DD:EE:FF"
// DOWNSTREAM: config — change operational parameters
{"type":"config","rate_hz":50,"tx_slot_us":5000,"variance_threshold":0.02}
// all fields optional; omit to leave unchanged
// DOWNSTREAM: ota — trigger firmware update
{"type":"ota","url":"http://spaxel.local:8080/firmware/spaxel-1.3.0.bin",
"sha256":"e3b0c44298fc1c149afb...","version":"1.3.0"}
// DOWNSTREAM: reboot
{"type":"reboot","delay_ms":1000}
// DOWNSTREAM: identify — blink LED
{"type":"identify","duration_ms":5000}
// DOWNSTREAM: baseline_request — node sends a health frame with extra CSI stats
{"type":"baseline_request"}
// DOWNSTREAM: shutdown — mothership is shutting down
{"type":"shutdown","reconnect_in_ms":30000}
// DOWNSTREAM: reject — authentication or policy failure (connection closes after)
{"type":"reject","reason":"invalid_token"}
// reason values: "invalid_token" | "unknown_node" | "rate_limited"
Protocol rules:
- Node sends
helloas the first message. Mothership responds withrole+configwithin 2 s, orreject(then closes the connection). - OTA is two-phase: mothership sends
ota→ node sendsota_statusframes → mothership monitors until"rebooting"or"failed". - Both sides ignore unknown
typevalues. - Node does not need to ACK non-OTA downstream messages (fire-and-forget). Role and config changes take effect immediately on receipt.
- TCP/WebSocket ordering guarantees in-order delivery; no sequence numbers needed.
Authentication:
- On first run, the mothership auto-generates a random 256-bit installation secret (
SPAXEL_INSTALL_SECRET), stores it in SQLite, and prints it once to stdout:[SPAXEL] Installation secret: <hex>. Shown once — saved to /data/spaxel.db. - If
SPAXEL_INSTALL_SECRETis set in the environment, it takes precedence (useful for scripted deployments). - During provisioning, the mothership derives a per-node token:
HMAC-SHA256(install_secret, node_mac). This token is embedded in the provisioning NVS payload written via Web Serial. - On WebSocket connect, the node includes its token as the
X-Spaxel-TokenHTTP header during the upgrade request. The mothership verifies before completing the upgrade. Nodes without a valid token are rejected with HTTP 401 and the connection is closed. An invalid-token counter per IP triggers a 60-second block after 5 consecutive failures. - Dashboard access is protected by a PIN. On first run, if no PIN is configured, the dashboard shows a one-time setup page to set a PIN (stored as bcrypt hash in SQLite). Subsequent visits require the PIN, which issues a session cookie (secure, HttpOnly, 7-day TTL). If the mothership is behind Traefik with TLS, the cookie is also SameSite=Strict.
- The
/ws/dashboardendpoint verifies the session cookie before upgrading. Unauthenticated dashboard connections receive HTTP 401. - LAN-only binding (defense in depth): By default, the mothership binds only to the interface associated with the container's LAN-facing network. The
SPAXEL_BIND_ADDRenvironment variable (default0.0.0.0) can restrict this further. Users are advised not to expose port 8080 directly to the WAN without a reverse proxy with TLS. - Nodes provisioned before auth was added (e.g., during development) can be re-provisioned via Web Serial to obtain a valid token. The dashboard shows an "Unpaired" badge on nodes connecting without a token during a one-time migration window (configurable, default 24 h after auth is first enabled), then enforces strict rejection thereafter.
Connection management:
- One goroutine per connection handles both directions
- Node identity established by the
hellomessage on connect (MAC + firmware version) - Maintain per-link ring buffers (last 256 samples, ~5–12 s at 20–50 Hz)
- Detect new links automatically — no pre-registration required
- Connection state is authoritative: disconnect = node offline (immediate, no timeout). Reconnect = node online, re-send current config
- Write CSI frames to the recording buffer for time-travel replay (see Component 14)
- Pass completed sample windows to the signal processing pipeline
Binary CSI frame validation (ingestion server):
Before the frame is enqueued for processing, the following checks are applied. Frames failing any check are silently dropped (not logged at INFO — only at DEBUG to avoid flooding logs at 600 frames/s):
Minimum frame length: 24 bytes (header only; n_sub=0 is valid as a header-only probe)
Maximum frame length: 24 + 128×2 = 280 bytes (n_sub max = 128; more = malformed)
(ESP32-S3 CSI is 64 subcarriers; 128 is a safety margin for future hardware)
Validation rules (in order):
1. len(frame) < 24: drop — frame too short to contain header
2. n_sub = frame[23]: read from byte 23
3. 24 + n_sub×2 != len(frame): drop — payload length mismatch
4. n_sub > 128: drop — implausible subcarrier count
5. rssi (frame[20]) == 0: allowed (0 = invalid RSSI per firmware spec); flag for AGC skip in pipeline
6. channel (frame[22]) == 0: drop — channel 0 is invalid
7. channel (frame[22]) > 14: drop — invalid 2.4 GHz channel
On drop: increment per-connection malformed_frame_count counter.
If malformed_frame_count > 100 within 1 minute: log WARN "Node [mac] sending malformed CSI frames".
If > 1000 within 1 minute: close connection (likely firmware bug or protocol mismatch).
3. Signal Processing Pipeline
Runs per-link, converting raw I/Q into motion features. Based on docs/research/04-signal-processing.md.
Pipeline stages:
Raw I/Q → Complex CSI → Phase Sanitisation → Feature Extraction → Motion Score
-
Complex CSI computation: Convert int8 I/Q pairs to float64 complex numbers, compute amplitude and phase per subcarrier
-
Phase sanitisation (per CSI frame, per link):
Input: n_sub int8 pairs (I_k, Q_k), rssi_dbm int8 Step 1 — Complex CSI: for k in 0..n_sub-1: csi[k] = complex(float64(I_k), float64(Q_k)) amplitude[k] = abs(csi[k]) // = sqrt(I²+Q²) phase[k] = atan2(Q_k, I_k) // radians, range [-π, π] Step 2 — RSSI normalization (AGC compensation): rssi_ref = -30.0 // dBm if rssi_dbm != 0: // 0 = invalid; skip normalization norm = pow(10.0, (rssi_ref - float64(rssi_dbm)) / 20.0) amplitude[k] *= norm for all k Step 3 — Spatial phase unwrapping (across subcarriers, per frame): // Detect and correct 2π jumps between adjacent subcarrier phases unwrapped[0] = phase[0] for k in 1..n_sub-1: delta = phase[k] - phase[k-1] while delta > π: delta -= 2π while delta < -π: delta += 2π unwrapped[k] = unwrapped[k-1] + delta Step 4 — Linear regression (OLS) over data subcarriers: // Fit: unwrapped_phase_k = a·k + b, where k = subcarrier index // X axis = subcarrier index k, Y axis = unwrapped_phase[k] // Use only data subcarrier indices (not null/guard/pilot) // Closed-form OLS: n = len(data_indices) sum_k = sum(k for k in data_indices) sum_kk = sum(k² for k in data_indices) sum_y = sum(unwrapped[k] for k in data_indices) sum_ky = sum(k·unwrapped[k] for k in data_indices) denom = n·sum_kk - sum_k² a = (n·sum_ky - sum_k·sum_y) / denom // STO slope (radians/subcarrier) b = (sum_y - a·sum_k) / n // CFO intercept (radians) Step 5 — Residual phase: residual[k] = unwrapped[k] - (a·k + b) for all k Output: amplitude[k] (RSSI-normalized), residual[k] (phase) Both are float64 arrays of length n_sub. The residual phase is the primary input to NBVI selection and feature extraction. The raw amplitude (before normalization) is NOT stored — only the normalized version.If any step produces NaN or Inf (e.g., rssi_dbm causes overflow, zero I/Q pair), the frame is skipped and a warning is logged. The regression denominator is checked for near-zero before division.
Note on subcarrier spacing: The STO slope
ain radians/subcarrier corresponds to a round-trip delay ofa / (2π × Δf)whereΔf = 312.5 kHzfor HT20. This estimate is not used further — it's removed as a nuisance parameter.Pipeline ordering: CSI frames are first written to the replay store (raw binary, before any processing), then phase sanitization runs on a copy. This ensures the replay store contains raw I/Q and any algorithm changes can be applied to it later.
-
Subcarrier selection:
HT20 (802.11n 20 MHz) subcarrier map (64 total):
- Null subcarriers (excluded): indices 0 (DC), 1, 63 (guard)
- Guard band (excluded): indices 27–37 (center guard + upper null carriers)
- Pilot subcarriers (excluded from NBVI selection): indices 7, 21, 43, 57
- Data subcarriers (eligible): all remaining = 47 subcarriers
NBVI (Normalized Bandwidth Variance Index) selection algorithm:
NBVI for subcarrier i over window W:
NBVI_i = Var(amplitude_i) / (Mean(amplitude_i))²This normalizes variance by the square of the mean amplitude, making the metric scale-invariant across subcarriers with different mean gains.
- Variance and mean computed using Welford's online algorithm for numerical stability over a sliding window of W=100 samples (~5 s at 20 Hz)
- Update period: NBVI scores recalculated every 2 s (every 40 samples at 20 Hz)
- Minimum samples required before applying selection: 50 samples (~2.5 s). Before that, use all 47 data subcarriers
- Selection: take the top 16 subcarriers by NBVI score (the 16 with highest normalized variance)
- Threshold floor: exclude any subcarrier with NBVI < 0.001 even if it would be in the top 16 (indicates a degenerate link)
- Fallback: if fewer than 8 subcarriers pass the threshold, use all data subcarriers (link quality may be poor)
- Selected subcarrier indices are stored as a
[64]boolmask per link in memory (recomputed on restart from buffered samples; no SQLite persistence needed) - NBVI is computed on phase-sanitised CSI amplitudes (after STO/CFO removal), not raw amplitudes
- deltaRMS and phase variance features use only selected subcarriers. Breathing band uses all 47 data subcarriers (the low-frequency signal is spread across all subcarriers; selection would discard useful signal)
- Diagnostic view shows a per-subcarrier NBVI bar chart with selected subcarriers highlighted in green
-
Feature extraction (computed per fusion tick, 10 Hz, on a window of recent samples from the link's ring buffer):
deltaRMS (primary motion indicator):
selected = NBVI-selected subcarrier indices (up to 16) deltaRMS = sqrt( mean( (amplitude_norm[k] - baseline[k])^2 for k in selected ) )amplitude_norm[k]: RSSI-normalized amplitude from phase sanitizationbaseline[k]: current EMA baseline for subcarrier k (see baseline management below)- Result is dimensionless. Typical values: ~0.02 (empty room), ~0.10 (walking), ~0.30 (vigorous motion)
- A 5-sample exponential smoothing (α=0.3) is applied to deltaRMS before thresholding:
smooth_deltaRMS = 0.3·deltaRMS + 0.7·prev_smooth
Phase variance (sub-wavelength displacement indicator):
phase_variance = variance( residual_phase[k] for k in selected )- Computed per-frame over selected subcarriers. High variance = person at non-null position in Fresnel zone.
- Reported in diagnostics panel. Not directly used in Fresnel accumulation (deltaRMS is the primary).
Breathing band (stationary person detection):
- IIR Butterworth bandpass filter, order 4, passband 0.1–0.5 Hz, sampling rate 20 Hz (at active rate)
- Filter applied to the time series of
mean(residual_phase[k])over all 47 data subcarriers - Implemented as two cascaded biquad sections (standard Butterworth IIR representation). Biquad coefficients precomputed at build time for Fs=20 Hz using the bilinear transform; embedded as constants.
- Filter state (4 state variables per biquad section = 8 floats) maintained per link in memory
breathing_rms = sqrt( mean( filtered_phase[t]^2 over last 60 s = 1200 samples ) )- Detection threshold: breathing_rms > 0.005 radians (sustained for >30 s) → stationary person present
- Breathing rate estimation: FFT over 512-sample window (25.6 s at 20 Hz), zero-padded to 1024 points. Frequency resolution: 20/1024 ≈ 0.02 Hz. Find dominant peak in 0.1–0.5 Hz bin range. Convert bin to Hz to bpm. Apply 60-second EMA smoothing to the rate estimate.
- Only computed when
smooth_deltaRMS < 0.03(person is still). When in motion, breathing detection is disabled.
-
Baseline management:
- EMA baseline per link per subcarrier with time constant τ = 30 s (configurable)
- Update rule:
α = dt / (τ + dt)where dt = 1/fusion_rate_hz ≈ 0.1 s → α ≈ 0.0033 baseline[k] = α·amplitude_norm[k] + (1-α)·baseline[k]- Motion-gated updates: Only update when
smooth_deltaRMS < motion_threshold (default 0.05)— prevents adapting to a stationary person - Initialization: On first data for a link, baseline = first amplitude sample. On restart, restored from the most recent SQLite baseline snapshot.
- If the loaded snapshot is older than 7 days: use it as the starting point but mark baseline confidence as 0.3 (show as low in the diagnostic view). The confidence increases as the EMA accumulates new quiet-room samples.
- Baseline snapshots stored to SQLite every 60 s and on graceful shutdown. Also stored on demand (manual re-baseline, node position change).
- Re-baseline triggered on: node position change, manual request from dashboard, significant environment change detected (drift >2σ from calibration snapshot)
4. Fusion & Localizer
Combines per-link motion scores into spatial blob positions. Based on Fresnel zone geometry (docs/research/03-algorithms.md).
Algorithm — Fresnel Zone Weighted Localization:
Physical constants:
- WiFi wavelength: λ = c/f = 3×10⁸ / 2.437×10⁹ ≈ 0.123 m (for 2.437 GHz, channel 6)
- Half-wavelength excess per zone: λ/2 ≈ 0.0615 m
Full algorithm:
-
Divide the floor plan into a 3D grid:
- XY resolution:
SPAXEL_GRID_CELL_M(default 0.20 m). XY extent: derived from zone bounds union (the bounding box of all defined zones + 0.5 m margin on each side) - Z resolution: 0.10 m (fixed; not configurable — Z accuracy is already limited by node height diversity). Z extent: 0 to
max(zone.z + zone.h for all zones)(default 0 to 3.0 m if no zones defined yet) - Grid origin: (min_x - 0.5, min_y - 0.5, 0) where min_x/min_y are the minimum zone coordinates
- Grid is reallocated (zeroed and recreated) whenever zone bounds change or SPAXEL_GRID_CELL_M changes
- Maximum grid size: 100×100×30 cells = 300,000 cells (prevents memory explosion; warn if zone bounds exceed this at current cell size)
- XY resolution:
-
Per-link zone number cache (computed once per link, cached in memory, invalidated when any node is repositioned):
- For link (TX at position T, RX at position R) and grid cell center P:
ΔL = |P-T| + |P-R| - |T-R|(path length excess over direct path, in meters)zone_number = ceil(ΔL / (λ/2))(zone 1 if ΔL < λ/2, zone 2 if ΔL < λ, etc.)- Cells with
zone_number > 5getzone_number = OUTSIDE(weight = 0, excluded from accumulation) - Cache format: sparse map
{link_id: [(cell_idx, zone_number), ...]}— only cells inside zone 5 are stored
-
Per-frame accumulation:
- For each active link (deltaRMS > threshold, default 0.02):
cell_weight = deltaRMS × link_weight[link_id] × zone_decay(zone_number)- Accumulate into a
float643D grid (same dimensions as the zone cache)
-
Zone decay function (parameterizable via time-travel tuning slider "Fresnel weight decay rate"):
zone_decay(n) = 1.0 / pow(float64(n), decay_rate)- Default
decay_rate = 2.0(inverse square, consistent with ~10 dB sensitivity gradient between zones) - Slider range: 1.0 (flat/no decay) to 4.0 (strong decay, zone 1 dominates completely)
- Zone 1: weight = 1.0; Zone 2: 0.25; Zone 3: 0.11; Zone 4: 0.0625; Zone 5: 0.04 (at default decay_rate=2)
-
Combined cell weight:
accumulated[cell] += deltaRMS × link_weight[link_id] × zone_decay(zone_number)link_weight[link_id]is the per-link learned weight (Component 22), initialized to 1.0
-
Extract peaks from the accumulated grid (3D local maxima using 6-connected neighborhood)
- Minimum peak height threshold: configurable, default 0.10 (in accumulated weight units)
- Non-maximum suppression: only keep cells with value > all 6 direct neighbors
-
Each peak becomes a blob:
{x, y, z, confidence}where:max_possible_weight = sum(deltaRMS[link] × link_weight[link] for all active links with deltaRMS > threshold) confidence = min(1.0, peak_value / max_possible_weight)This normalizes confidence so that a cell at zone-1 intersection of all active links would get confidence ≈ 1.0. If
max_possible_weight = 0(no active links), no peaks are emitted (confidence would be undefined).
Active link threshold: deltaRMS > 0.02 (configurable). Links below threshold contribute zero weight to the grid (no partial contribution — prevents noise accumulation).
Grid reset: The accumulated grid is zeroed at the start of every fusion loop iteration (10 Hz). It is not persisted between frames. 6. Blob tracking: Assign persistent IDs via greedy nearest-neighbor matching against previous frame's tracked blobs. Association threshold: 1.0 m (if the nearest old blob is > 1.0 m away, the new peak is treated as a new blob).
Assignment algorithm (greedy, O(N×M)):
Input: new_peaks = list of {x,y,z,confidence} from Fresnel grid extraction
old_blobs = list of active UKF-tracked blobs with predicted positions at current time
1. Build distance matrix D[i][j] = |new_peaks[i].pos - old_blobs[j].predicted_pos|
2. Sort all (i, j) pairs by D[i][j] ascending
3. Greedy assignment: for each (i, j) in sorted order:
if new_peaks[i] not yet assigned AND old_blobs[j] not yet matched AND D[i][j] <= 1.0m:
assign new_peaks[i] → old_blobs[j] (update UKF with new_peaks[i] as measurement)
4. Unmatched new_peaks: create new blob with fresh ID (monotonically increasing uint64 counter)
5. Unmatched old_blobs: run predict-only UKF step (no measurement); decay confidence
Tie-breaking: if two new peaks are equidistant from an old blob, the one with higher confidence
wins. The other gets its own new ID.
Blob ID lifecycle:
- IDs are assigned at creation and never reused within a mothership session
- ID counter is in-memory only (resets to 1 on restart); dashboard handles re-mapping on reconnect
- Maximum concurrent tracked blobs: 20 (configurable via settings
"max_tracked_blobs"); additional peaks above this limit are discarded (log WARN if exceeded). This prevents runaway blob proliferation from noise spikes.
New peaks get new IDs; unmatched old blobs decay over 3 s before removal
7. Biomechanical UKF: Per-blob Unscented Kalman Filter (UKF). Library: gonum/floats + gonum/mat for matrix operations. All UKF state is in memory per blob (not persisted to SQLite).
State vector (n=6): x = [px, py, pz, vx, vy, vz] (position in meters, velocity in m/s)
Process model (constant velocity with noise, dt = 0.1 s at 10 Hz):
px' = px + vx·dt
py' = py + vy·dt
pz' = pz + vz·dt
vx' = vx (+ process noise)
vy' = vy (+ process noise)
vz' = vz (+ process noise)
Process noise covariance Q (diagonal, tuned empirically):
- Position: σ_p = 0.01 m² (= (0.1 m)²) — small, position changes are smooth
- Velocity: σ_v = 0.25 m²/s² (= (0.5 m/s)²) — allows velocity changes of ±0.5 m/s per step
Q = diag([σ_p, σ_p, σ_p, σ_v, σ_v, σ_v·0.25])— Z velocity noise halved (humans are more constrained vertically)
Measurement model: observation z_obs = [px, py, pz] (position only from Fresnel peak)
Measurement noise covariance R (3×3 diagonal, adaptive):
- Base: σ_obs = 0.3 m (= (0.3 m)²) — typical Fresnel grid accuracy
- Adaptive:
R = diag([σ_obs/confidence, σ_obs/confidence, σ_obs_z/confidence])whereconfidenceis the Fresnel peak confidence (0–1) andσ_obs_z = 1.0 m(Z is less accurate)
Initial state (new blob from Fresnel peak):
- State:
[peak_x, peak_y, peak_z, 0, 0, 0](zero initial velocity) P0 = diag([1.0, 1.0, 1.0, 4.0, 4.0, 4.0])(large initial uncertainty)
UKF sigma point parameters (Wan & van der Merwe 2000):
alpha = 0.001,beta = 2.0,kappa = 0,lambda = alpha²·(n+kappa) - n- 2n+1 = 13 sigma points
Biomechanical constraint application (applied to each sigma point after prediction, before covariance computation):
- Maximum XY velocity: clamp
sqrt(vx²+vy²)to 2.0 m/s (scale both components proportionally) - Maximum acceleration: if |dv/dt| > 3.0 m/s², scale back the velocity delta
- Minimum turning radius: if velocity direction changes by >30° in one step, limit the angular change
- Z velocity: clamp to [-9.8·dt, +1.5·dt] m/step (gravity floor, biological ceiling)
- Collision avoidance soft repulsion: if two blobs are within 0.4 m, add a repulsion delta to each sigma point's XY position (force = 0.1·(0.4-d)/0.4 m per step)
Persistence (no Fresnel peak within 1.0 m association threshold):
- Run predict-only (no measurement update)
- Confidence decays:
confidence *= exp(-dt/τ)whereτ = 1.0 s - After 3.0 s of no association (confidence < 0.05), mark blob for removal
- On removal: log a
blob_disappearedevent; if BLE-identified, log aperson_left_detectionevent
Warm start: if a blob is re-associated after a brief gap (<3 s, predict-only), it resumes with its last predicted state (not reset). This prevents ID-swapping when two blobs are close and briefly merge in the grid. 8. BLE identity matching (see Component 21): Fuse BLE RSSI reports with blob positions to assign person/device labels to tracked blobs 9. Self-improving weights (see Component 22): Use BLE proximity as continuous ground truth to refine per-link Fresnel zone weights over time 10. Publish blob list (with identity, posture, velocity) to dashboard via WebSocket at 10 Hz
Multi-person handling:
- Works naturally: multiple people create multiple peaks in the Fresnel accumulation grid
- Degrades gracefully: overlapping Fresnel zones merge blobs when people are close together
- Practical limit: 2–3 people reliably, 4+ increasingly unreliable
Z-axis:
- Requires nodes at mixed heights (e.g., 0.3 m and 2.0 m)
- Fresnel zones computed in full 3D — blob Z-coordinates are first-class, rendered as true vertical positions in the 3D scene with pillar anchors to the ground plane
- Resolution ±1–2 m — enough for "standing vs. lying down", and critical for fall detection (Component 16)
5. Fleet Manager
Manages the lifecycle and role assignment of all ESP32 nodes.
Node registry (SQLite):
- MAC address (primary key)
- Friendly name (user-assigned)
- Position (x, y, z in floor plan coordinates — set during onboarding)
- Current role (TX / RX / TX_RX / PASSIVE / IDLE)
- Firmware version
- Last heartbeat timestamp
- Status (ONLINE / STALE / OFFLINE)
Role assignment engine:
The mothership decides which nodes transmit, receive, or do both. Goals:
- Maximize spatial coverage (link diversity across the floor plan)
- Minimize RF contention (too many simultaneous TXs cause packet collisions)
- Ensure every node participates in at least one link
Strategy:
- With ≤4 nodes: All nodes TX/RX (time-division). Every pair forms a bidirectional link
- With 5–8 nodes: Select ~40% as dedicated TX, rest as RX. Optimize TX selection for angular diversity and GDOP minimization
- With 9+ nodes: Cluster nodes by room/zone. Within each zone, apply the 5–8 strategy. Cross-zone links use perimeter nodes
- Passive radar option: If a home WiFi AP BSSID is configured, all nodes default to PASSIVE (RX-only using router traffic). Dedicated TX nodes can be added for higher resolution but aren't required
- Role changes pushed as JSON messages on the node's existing WebSocket connection
Stagger schedule:
TX nodes transmit in time-division slots to avoid packet collisions. WiFi CSMA/CA handles residual collisions transparently, but staggering reduces collision probability dramatically.
Slot computation:
- Period:
period_us = 1,000,000 / packet_rate_hz(e.g., 50,000 µs at 20 Hz) - Slot width: 40% of period (e.g., 20,000 µs at 20 Hz) — guard time is 60% to tolerate clock drift
- Node i (of N TX nodes) is assigned:
tx_slot_offset_us = i × period_us / N - Sent to each TX node as
tx_slot_usin theconfigdownstream message
Clock synchronization:
- Each ESP32 runs
esp_sntp_init()on boot, syncing topool.ntp.org(default) or a configurable server - NTP server configurable via: (a)
SPAXEL_NTP_SERVERenv var on the mothership → embedded in the provisioning payload; (b)configdownstream message fieldntp_server: "192.168.1.1" - NTP sync is attempted for up to 10 s on boot. If it fails, the node transmits at the configured rate without stagger (relies on CSMA/CA for collision avoidance), and logs a warning in the health message
- ESP32 crystal accuracy: ±20 ppm → ±1.2 ms drift per minute. With 20,000 µs slot widths, drift is negligible within a 10-minute window between NTP resync
- Resync: nodes resync NTP every 10 minutes (configurable)
Slot timer implementation (firmware):
- On receiving
tx_slot_usin aconfigmessage: cancel any existing TX timer; computenext_tx_us = unix_us_now + (tx_slot_offset_us - (unix_us_now % period_us)); if next_tx_us < now: next_tx_us += period_us - Create an
esp_timerrecurring timer that fires atnext_tx_usand then everyperiod_us - On timer fire: send one probe packet burst (the configured probe sequence for CSI capture)
Collision detection (mothership):
- If CSI frames from two different TX nodes arrive within 3 ms of each other, the mothership logs a "possible slot collision" metric for that link pair
- If collision rate > 5% over a 60-second window, the mothership re-randomizes the stagger assignments (shifts one node's slot by half a slot width) and pushes updated
configmessages
In passive radar mode: No stagger is needed — the router is the only TX. The router's beacon interval (configurable by the user, typically 100 ms = 10 Hz) is the natural clock.
Health monitoring:
- Node WebSocket connection state is authoritative: connected = ONLINE, disconnected = OFFLINE
- Nodes send health JSON (heap, RSSI, uptime, temperature) every 10 s on their WebSocket
- Dashboard shows node status with color coding (green/yellow/red)
Self-healing (see Component 12): When a node goes offline, the fleet manager automatically re-optimizes roles among remaining nodes to maintain best possible coverage.
6. OTA Update System
Firmware updates pushed from mothership to fleet.
Update flow:
- New firmware binary placed in mothership's
/firmware/volume (or uploaded via dashboard) - Mothership computes SHA-256 hash, stores version metadata
- Dashboard shows "Update available" badge on nodes with older firmware
- Update initiated: manually (single node or "Update All") or automatically (see below)
- Mothership sends OTA command on the node's WebSocket:
{type: "ota", url: "http://mothership:8080/firmware/latest.bin", sha256: "...", version: "..."} - Node downloads firmware via HTTP, verifies SHA-256, writes to inactive OTA partition
- Node reboots, reconnects WebSocket, sends
hellowith new firmware version — mothership confirms upgrade - If new firmware fails to connect within 60 s, ESP-IDF rollback mechanism reverts to previous partition
Auto-update mode (configurable, default: off):
- When enabled, the mothership automatically begins a rolling update when new firmware is detected in
/firmware/ - Canary strategy: First, update a single node (the one with the lowest coverage impact if lost). Monitor its detection quality contribution for 10 minutes against the fleet baseline. If quality holds (no degradation >5%), proceed with rolling update of remaining nodes. If quality degrades, automatically roll back the canary and alert the user: "Auto-update paused: canary node showed degraded performance. Review before retrying."
- Scheduling: Auto-updates only run during a configurable quiet window (default: 02:00–05:00 local time) to minimize disruption. If no quiet window is configured, updates run when all zones have been vacant for >10 minutes
- Dashboard settings: Toggle auto-update on/off, set quiet window, set canary duration, view update history
- Notifications: Timeline event + push notification on auto-update start, canary result, completion, or failure
Firmware format and partition layout:
- Firmware binaries are raw ESP-IDF OTA application images (the output of
idf.py build→build/spaxel.bin) - The version string is embedded in the binary via
esp_app_desc_t(set viaCONFIG_APP_PROJECT_VERin sdkconfig). The mothership reads the version from the firmware metadata at upload time by parsing theesp_app_desc_tstructure at offset 32 bytes from the image start. - Partition layout (
partitions.csv):factory(4 MB),ota_0(4 MB),ota_1(4 MB),nvs(24 KB),otadata(8 KB). The dual OTA partition scheme (ota_0 + ota_1) is required for rollback. - Maximum firmware image size: 4 MB (limited by OTA partition size). Firmware must not exceed 3.8 MB to leave a safety margin.
- Firmware file naming:
spaxel-<semver>.bin(e.g.,spaxel-1.2.3.bin). Theis_latestflag in the firmware table marks the newest uploaded version.
Node OTA procedure (firmware side):
- Receive
{type:"ota", url, sha256, version}on WebSocket - Check: if current firmware version equals
version, skip and reply withota_status: rebootingis NOT sent — instead, reply with health message (already on latest) - Check: free heap ≥ 20 KB before starting (reject if insufficient, send
ota_status: failed, error:"low_heap") - Open HTTP connection to the OTA URL (using
esp_http_client); 30 s connect timeout - Simultaneously: feed received chunks to
esp_ota_write()(4 KB chunks) AND feed to SHA-256 running hash - Send
ota_status: downloading, progress_pct: Nevery 10% of download - On download complete: finalize SHA-256 hash, compare to expected. On mismatch: abort, send
ota_status: failed, error:"sha256_mismatch", do NOT reboot - If SHA-256 matches: call
esp_ota_end()andesp_ota_set_boot_partition()to make the new partition active - Send
ota_status: rebooting— thenesp_restart()after 1 s - On boot from new partition: firmware calls
esp_ota_mark_app_valid_cancel_rollback()ONLY after successfully sendinghelloAND receiving arolemessage from the mothership (confirms connectivity) - If the new firmware fails to connect and mark itself valid within 60 s of boot: ESP-IDF automatically rolls back to the previous partition on next reset
Rollback detection: When a node reconnects with the same firmware version it had before an OTA attempt, the mothership checks if it was expecting a new version. If so, it marks the node's OTA status as ROLLBACK_OCCURRED in the dashboard (amber badge) and logs the event.
Safeguards:
- Never update all nodes simultaneously — rolling update with 30 s gap between nodes
- Canary node monitored before fleet-wide rollout (auto-update mode)
- If >50% of fleet goes OFFLINE during a rolling update, halt and alert
- Dashboard shows update progress per node (PENDING → CANARY → DOWNLOADING → REBOOTING → VERIFIED / FAILED / ROLLBACK)
- OTA URL does not require session authentication (it's served locally; IP-restricted to the container network)
- Old firmware versions are retained in
/firmware/(not auto-deleted); configurable retention count (default 3)
7. Onboarding Flow
Adding a new ESP32-S3 node to the fleet.
Zero-config first run (Web Serial — requires Chrome/Edge):
- User connects ESP32-S3 via USB to the machine running the dashboard
- Dashboard's "Add Node" page uses Web Serial API to connect to the device
- Mothership generates a provisioning payload: WiFi SSID, WiFi password, unique node ID. No mothership IP needed — firmware discovers mothership via mDNS (
_spaxel._tcp.local) - Dashboard flashes firmware + writes provisioning config to NVS via
esptool-js - Device reboots, connects to WiFi, discovers mothership via mDNS, opens WebSocket, sends
helloregistration - Mothership auto-detects the home WiFi AP's BSSID and begins passive radar mode — presence detection is working within 30 seconds with zero additional configuration
- Dashboard shows guided wizard: "New node detected! I'm already seeing signal from your router. Walk around your space so I can start detecting presence."
- After 60 s of user walking: "Great, I can see you. Let me help you place this node optimally." — Coverage painting activates, user drags node to position
- User sets the node's Z-height (dropdown: floor level / desk height / ceiling mount, or manual entry)
- "Want better accuracy? Plug in another ESP32." — Repeat from step 1
Working presence detection in under 5 minutes with zero manual network configuration.
Subsequent re-provisioning:
- If a node loses WiFi, captive portal at
spaxel-XXXXallows re-entering credentials - If mDNS fails (some networks block it), captive portal allows manual mothership IP entry
- Full reflash available via Web Serial from the dashboard
Payload generation and Web Serial protocol:
POST /api/provision (no auth required — called by the dashboard Web Serial flow) returns a JSON provisioning payload:
{
"version": 1, // provisioning format version
"wifi_ssid": "MyNetwork", // WiFi SSID
"wifi_pass": "secret", // WiFi passphrase
"node_id": "f47ac10b-...", // UUID4 generated by mothership
"node_token": "a1b2c3...", // 64-char hex HMAC-SHA256(install_secret, node_mac)
"ms_mdns": "spaxel", // mDNS service name
"ms_port": 8080,
"debug": false
}
The mac parameter in the POST request body is optional. If provided, node_token is derived from that MAC. If absent, the mothership generates a UUID4 node_id and a placeholder token; the token is finalized when the node sends its first hello with its actual MAC (the token is recomputed and validated then, with a 120-second grace window for the node to connect after provisioning).
Web Serial provisioning protocol:
The firmware includes a serial provisioning listener active for the first 10 seconds after boot (or until provisioning completes). During this window, the firmware reads \n-terminated JSON from UART at 115200 baud:
Firmware listens for: {"provision": <provisioning_json>}\n
Firmware responds: {"ok": true, "mac": "AA:BB:CC:DD:EE:FF"}\n (on success)
{"ok": false, "error": "..."}\n (on failure)
esptool-js integration:
- Version: Pin to
esptool-js@0.4.x(latest stable as of IDF 5.2 era). Loaded as an ES module from the mothership's static files (not CDN — bundled in the dashboard to avoid external dependencies):<script type="module">import { ESPLoader, Transport } from "/static/esptool-js/bundle.js"</script> - The
bundle.jsis built from theesptool-jsnpm package at Docker image build time:npx esbuild esptool-js/bundle.js --bundle --outfile=dashboard/static/esptool-js/bundle.js - Firmware binary
spaxel-X.Y.Z.binis fetched fromGET /firmware/<filename>during the flash step (no auth required — uses the existing no-auth OTA serving path).
The dashboard's Web Serial flow (using esptool-js for firmware flashing + the Web Serial API for provisioning):
- Flash firmware binary to the device (erase + write to factory partition, 0x10000 offset):
const transport = new Transport(serialPort); // Web Serial API port const loader = new ESPLoader({transport, baudrate: 921600, terminal: {...}}); await loader.main_fn(); // connect + detect chip await loader.write_flash({ // erase + write fileArray: [{data: firmwareArrayBuffer, address: 0x10000}], flashSize: "keep", eraseAll: false, compress: true }); await loader.after = "hard_reset"; // reboot into firmware - Open a serial port at 115200 baud (using the Web Serial API directly, not esptool)
- Wait for the device to reboot and enter the provisioning window (wait for "SPAXEL READY\n" prompt or 3 s timeout)
- Send:
{"provision": <JSON from POST /api/provision>}\n - Read the response. On
{"ok": true}, extract the MAC and immediately callPOST /api/provisionwith the confirmed MAC to finalize the node token.
Fallback when Web Serial is not available (Firefox, Safari, or non-HTTPS context): Show a download link for spaxel-X.Y.Z.bin and instructions for esptool.py --port /dev/ttyUSB0 write_flash 0x10000 spaxel-X.Y.Z.bin. The provisioning JSON is shown as text for manual entry via the captive portal.
6. Dashboard shows "Node provisioned! Waiting for it to connect..." and polls for a hello from the node with the confirmed MAC.
Firmware partition offsets (matching partitions.csv):
- Factory app: 0x10000 (4 MB)
- OTA_0: 0x410000
- OTA_1: 0x810000
- NVS: 0x9000 (24 KB)
- OTA data: 0xE000 (8 KB)
Provisioning window timeout: The 10-second window prevents normal operations from being accidentally interrupted by serial noise. If no valid provisioning JSON is received in 10 s, the firmware proceeds with its normal boot sequence (using NVS credentials if provisioned, or entering captive portal mode if not).
8. Dashboard (Web UI)
Single-page application served by the mothership's HTTP server. Built on Three.js for full 3D spatial visualization. Five modes of interaction: Live View (3D), Fleet Status, Setup/Calibration, Simple Mode, and Ambient Mode. Cross-cutting UX systems: Activity Timeline (Component 27), Detection Explainability (28), Feedback Loop (29), Context Notifications (30), Quick Actions (32), Command Palette (34), Morning Briefing (35), and Guided Troubleshooting (36).
8a. Live View (default)
Full-screen 3D scene showing the monitored space with real-time blob visualization.
3D Scene:
- Renderer: Three.js WebGLRenderer, full viewport, adaptive pixel ratio
- Camera: PerspectiveCamera with OrbitControls — mouse drag to rotate, scroll to zoom, right-drag to pan. Touch support for mobile (pinch zoom, two-finger pan)
- Default view: Isometric-ish angle looking down at ~45° onto the space. One-click preset buttons: Top (plan view), Front, Side, Perspective
- Ground plane: Gridded floor at Y=0 with metric scale markings. Optional floor plan image mapped as a texture on the ground plane (user-uploaded PNG/JPG, calibrated via two-point distance measurement)
- Room bounds: Translucent wireframe box showing the defined space extents. Walls rendered as semi-transparent planes so interior is always visible
Blob rendering (3D) — Humanoid figures:
- Each detected person rendered as a simplified humanoid figure (
SkinnedMeshwith 4-5 blend poses), deliberately abstract to respect privacy but immediately readable:- Z > 1.4m + velocity > 0.3 m/s: Standing, walking animation (leg/arm swing via
AnimationMixer) - Z > 1.4m + velocity < 0.3 m/s: Standing idle, subtle breathing sway
- Z ~ 0.8–1.2m: Seated posture (bent legs, upright torso)
- Z < 0.5m: Lying down (horizontal figure)
- Z drops rapidly: Falling animation (ties into fall detection alert)
- Z > 1.4m + velocity > 0.3 m/s: Standing, walking animation (leg/arm swing via
- Color per person: when BLE identity is assigned, each figure gets a distinct color (user-configurable). Unidentified blobs use a neutral gray
- Vertical pillar: thin cylinder from ground plane to figure base — anchors XY position visually from non-top angles
- Trails: Last 60 positions rendered as fading footprint dots (small circles on ground plane) with per-vertex opacity. Color matches person color
- Person label: CSS2DRenderer overlay showing person name (or "Unknown #N") and zone. On hover: Z-height, velocity, confidence, BLE device name
Node overlay (3D):
- Each ESP32 rendered as a small box or custom mesh at its registered (x, y, z) position
- Color = status: green (ONLINE), yellow (STALE), red (OFFLINE)
- Links:
LineSegmentsbetween TX→RX pairs, dashed material, opacity proportional to link quality. Toggle-able via toolbar - Fresnel zones (debug toggle): Render first Fresnel zone ellipsoids as wireframe meshes between active link pairs — helps users understand coverage geometry
- Raycaster hover → tooltip with MAC, firmware version, RSSI, role. Click → detail panel slides in
Controls toolbar (floating):
- View presets: Top | Front | Side | Perspective
- Toggle layers: Links | Fresnel zones | Trails | Floor plan image | Crowd flow | Coverage quality
- Status bar: Node count (online/total), active blob count (with names if BLE-identified), system uptime, Detection Quality gauge
- Cmd+K shortcut hint (subtle, dismissible after first use)
Activity timeline sidebar (Component 27):
- Collapsible sidebar on the right edge of the 3D view
- Every event in one scrollable stream. Tap any event → 3D view jumps to that moment
- Inline thumbs up/down on every detection event (Component 29)
- "Why?" button on every detection to open explainability (Component 28)
- Search bar with natural language filtering
Spatial quick actions (Component 32):
- Right-click (desktop) or long-press (mobile) on any 3D element for context-sensitive actions
- Available on: blobs, nodes, empty space, zone labels, portals, trigger volumes
WebSocket feed: Mothership pushes updates at 10 Hz via /ws/dashboard. Blob IDs are in-memory integers; they are stable across the mothership's lifetime but reset on restart. The dashboard must handle a new set of blob IDs gracefully on reconnect.
The first message on every new WebSocket connection is always a snapshot message (type field = "snapshot"), sent within 100 ms of connection establishment. Subsequent messages omit the type field (treated as incremental updates). This enables instant dashboard usability on first connect and seamless reconnect.
// Snapshot message (first message on every connect/reconnect):
{
"type": "snapshot",
"blobs": [{"id": 1, "x": 3.2, "y": 1.1, "z": 0.8, "confidence": 0.85,
"vx": 0.3, "vy": -0.1, "vz": 0.0, "posture": "walking",
"person": "Alice", "ble_device": "iPhone (AB:CD:...)"}],
"nodes": [{"mac": "AA:BB:CC:DD:EE:FF", "status": "online", "role": "rx", "rssi": -45,
"name": "Kitchen North", "pos_x": 1.2, "pos_y": 0.5, "pos_z": 2.1,
"firmware_version": "1.2.3", "virtual": false}],
"zones": [{"id": 1, "name": "Kitchen", "count": 1, "people": ["Alice"],
"x": 0, "y": 0, "z": 0, "w": 4, "d": 3, "h": 2.5}],
"portals": [{"id": 1, "name": "Kitchen Door", "zone_a": "Hallway", "zone_b": "Kitchen"}],
"triggers": [{"id": 1, "name": "Couch Dwell", "state": "active", "elapsed": 142, "enabled": true}],
"confidence": 87,
"security_mode": false,
"predictions": [{"person": "Alice", "zone": "Kitchen", "probability": 0.87, "horizon_min": 15}],
"uptime_s": 3600
}
// Incremental update message (10 Hz after snapshot):
{
"blobs": [...],
"nodes": [...], // only nodes whose status changed since last frame
"zones": [...], // only zones whose occupancy changed since last frame
"triggers": [...], // only triggers that fired or changed state
"confidence": 87,
"events": [...] // new events since last frame (empty array if none)
}
WebSocket connection state management (dashboard client-side):
- Connection indicator: A small colored dot in the toolbar status bar: green (connected), amber (reconnecting), red (disconnected for >30 s).
- Brief disconnect (<5 s): 3D scene retains last known state. Blob positions are extrapolated using last known velocity. No visual indicator changes.
- Reconnecting (5–30 s): Scene dims slightly (50% opacity overlay). "Reconnecting..." spinner in status bar. On successful reconnect: full snapshot received, scene returns to normal immediately.
- Disconnected (>30 s): "Connection lost" modal appears with "Reload page" button. The modal is non-blocking — user can still view the last known state.
- Reconnect backoff: 1 s, 2 s, 4 s, 8 s, max 10 s. Jitter: ±500 ms on each attempt to prevent thundering herd.
- Post-reconnect: The 3D scene rebuilds from the snapshot. Blob trail history is cleared (trails only show post-reconnect positions). Timeline events fetched separately via REST API to restore history.
Performance:
- Scene updates driven by
requestAnimationFrame, decoupled from WebSocket rate - Blob positions interpolated between WebSocket frames for smooth 60 fps motion
InstancedMeshfor trail segments if blob count × trail length exceeds ~500 objects- LOD: reduce trail length and disable Fresnel zone rendering when >8 blobs active
8b. Fleet Status
Table view of all registered nodes. Overlaid as a slide-out panel over the 3D view.
| Column | Content |
|---|---|
| Name | User-assigned friendly name |
| MAC | Hardware address |
| Role | TX / RX / TX_RX — editable dropdown |
| Position | (x, y, z) — click to highlight node in 3D view and fly camera to it |
| Firmware | Version string + "Update available" badge |
| RSSI | Last reported WiFi signal strength |
| Status | ONLINE / STALE / OFFLINE with colored indicator |
| Uptime | Time since last boot |
| Actions | Restart, Update, Remove, Identify (blink LED) |
Global actions: Update All, Re-baseline All, Export Config, Import Config.
8c. Setup / Calibration
Space definition and node placement, all within the 3D view using TransformControls.
-
Space definition: Set room dimensions (width × depth × height) numerically or by dragging corner handles in the 3D scene. Multi-room: add adjacent boxes, each with its own dimensions
-
Floor plan image: Upload PNG/JPG, set two calibration points (click on image, enter real-world distance), image mapped as ground plane texture at correct scale.
Pixel-to-meter calibration transform:
Given: Point A: image pixel (pax, pay), real-world floor plan coords (ax, ay) m [user places A at a known corner] Point B: image pixel (pbx, pby), real-world floor plan coords: derived from A + distance real_distance_m: known real-world distance |A−B| in meters Step 1 — Compute pixel scale: pixel_distance = sqrt((pbx-pax)² + (pby-pay)²) [pixels] meters_per_pixel = real_distance_m / pixel_distance Step 2 — Compute rotation angle (image may not be axis-aligned with floor plan): angle_rad = atan2(pby-pay, pbx-pax) - atan2(by-ay, bx-ax) (by, bx): real-world coords of B. User sets A=(0,0) and B=(real_distance_m, 0). This reduces to: angle_rad = atan2(pby-pay, pbx-pax) (since B is along the +x axis) Step 3 — Convert any pixel (px, py) to floor plan meters: // Translate to A-relative pixel coords dx = px - pax; dy = py - pay // Rotate to align with floor plan orientation mx = dx × cos(-angle_rad) - dy × sin(-angle_rad) my = dx × sin(-angle_rad) + dy × cos(-angle_rad) // Scale to meters and add A's floor plan offset floor_x = ax + mx × meters_per_pixel floor_y = ay + my × meters_per_pixel The calibration is stored as {cal_ax, cal_ay, cal_bx, cal_by, cal_distance_m} in the floorplan table. meters_per_pixel and angle_rad are computed at load time and cached (not stored). Three.js mapping: the floor plan image is applied as a THREE.PlaneGeometry texture in the XZ plane (Y=0). The image is scaled so that image_width × meters_per_pixel = geometry width. -
Node placement: Drag-and-drop nodes in 3D using TransformControls (translate mode). Snap-to-grid optional. Set position numerically via the fleet table as fallback
-
Baseline management: View current baseline state per link. Manual re-baseline trigger. History of baseline snapshots
-
Environment change detection: Alert when link characteristics shift significantly from baseline (suggests re-calibration)
-
Diagnostic view: Raw CSI amplitude/phase plots per link (2D chart overlay). Useful for debugging node placement
8d. Simple Mode (Progressive Disclosure)
A clean, card-based interface for household members who don't need the full 3D engineering view.
- No 3D scene. Replaces the WebGL canvas with a responsive card layout
- Room cards: One per defined zone. Shows occupancy count, person names (if BLE-identified), and status color (green = empty, blue = occupied, red = alert). Tap to expand activity history for that zone
- Activity feed: Chronological list of events: "Alice entered Kitchen (2 min ago)", "Living Room vacant (15 min ago)", "Fall alert dismissed (1 hour ago)"
- Alert banner: Fall detection, anomaly alerts, system warnings — prominent but not overwhelming
- Quick actions: Arm/disarm security mode, trigger re-baseline, silence alerts
- Sleep summary card: Morning card showing last night's sleep data (if sleep monitoring is configured)
- Mobile-first: Designed as the primary mobile experience. Touch-friendly, no gestures required
- Switching: Toggle button in toolbar. Per-user default stored in browser
localStorage. Optionally: simple mode requires no auth, expert mode requires a PIN
9. Baseline & Calibration System
The baseline represents the "empty room" state of each link. All motion detection is relative to this baseline.
Establishing baseline:
- User triggers "Calibrate" from dashboard (or auto-triggered on first boot with all nodes online)
- Mothership collects 60 s of CSI data on all links with no people present
- Computes per-link baseline: mean amplitude and phase per subcarrier
- Stores baseline snapshot to SQLite with timestamp
Baseline drift handling:
- EMA continuously adapts baseline with long time constant (τ = 30 s)
- Motion-gated: EMA update paused when motion detected on the link — prevents adapting to a stationary person
- Environment change detection: If baseline drifts more than 2σ from the calibration snapshot across multiple links, dashboard shows "Environment changed — consider re-calibrating" alert
Triggers for re-calibration:
- Node physically moved (user indicates via dashboard)
- Significant furniture rearrangement
- Seasonal changes (temperature/humidity affect propagation)
- Manual request from dashboard
- Automatic suggestion when detection quality degrades (high false-positive rate or low confidence scores)
Baseline is per-link, not global. Moving one node only requires re-baselining links involving that node, not the entire fleet.
10. Passive Radar Mode (Router-as-TX)
Every home WiFi AP broadcasts beacon frames at ~10 Hz plus regular data traffic. In passive radar mode, ESP32 nodes operate as receive-only — they capture CSI from the existing router's transmissions without any dedicated TX nodes in the fleet.
How it works:
- The AP's BSSID is auto-detected during provisioning (no manual entry required). Each ESP32 node calls
esp_wifi_sta_get_ap_info()on boot to get the BSSID and channel of its connected AP, and includes this in thehellomessage as"ap_bssid": "AA:BB:CC:DD:EE:FF"and"ap_channel": 6. - Mothership collects
ap_bssidfrom all connected nodes. If all nodes report the same BSSID (or ≥80% agreement for mesh networks), the AP is auto-confirmed. If multiple BSSIDs appear (mesh network with different BSSIDs per satellite), each unique BSSID is registered as a separate virtual node. - The onboarding wizard shows: "I detected your router (AA:BB:CC:DD:EE:FF — ASUS Router). Using it as a signal source." The user confirms with one tap. If the auto-detected BSSID seems wrong, the user can enter it manually via a text field.
- Mothership creates a virtual node entry in the
nodestable for the AP: same schema as a real node, but with avirtual=1flag androle='ap'. The virtual node participates in Fresnel zone computation at its placed position. - The AP virtual node appears in the 3D editor as a router icon (distinct from the ESP32 box icon). It starts at position (0, 0, 0) — the user repositions it in the 3D editor to match the physical router's location.
- All ESP32 nodes are assigned PASSIVE role — promiscuous mode, filtering for the AP's BSSID
- Each node extracts CSI from every beacon/data frame received from the AP
- CSI streams to mothership as normal binary frames, with
peer_macset to the AP's BSSID - If the AP BSSID changes (router replacement, MAC address rotation on some routers), the dashboard shows an alert: "No CSI received from passive BSSID for >5 minutes. Your router's MAC address may have changed. [Re-detect BSSID]". Re-detection triggers fresh
helloreporting from all nodes. - OUI lookup: the first 3 bytes of the BSSID are looked up against an embedded OUI table (bundled in the Go binary at build time from the IEEE OUI registry) to show a friendly router manufacturer name.
- Source:
https://standards-oui.ieee.org/oui/oui.txt(download at build time in ago generatestep) - Format: A sorted text file
internal/oui/oui.txtwith lines:AA-BB-CC (hex) Manufacturer Name. Thego generatestep transforms it to a compact Go source fileoui_data.gowith avar ouiMap = map[uint32]string{0xAABBCC: "ASUS", ...}(3 bytes packed as uint32, big-endian). - Lookup:
func LookupOUI(mac []byte) string { key := uint32(mac[0])<<16 | uint32(mac[1])<<8 | uint32(mac[2]); if name, ok := ouiMap[key]; ok { return name }; return "" } - Embedding:
//go:generatetag ininternal/oui/gen.go; the generatedoui_data.gois committed to the repo (not re-generated on every build — only when manually updating the OUI list).
- Source:
Adding virtual_node column to the nodes table:
ALTER TABLE nodes ADD COLUMN virtual INTEGER NOT NULL DEFAULT 0;
ALTER TABLE nodes ADD COLUMN node_type TEXT NOT NULL DEFAULT 'esp32'
CHECK (node_type IN ('esp32','ap'));
ALTER TABLE nodes ADD COLUMN ap_bssid TEXT; -- for ap-type nodes: the BSSID being filtered
ALTER TABLE nodes ADD COLUMN ap_channel INTEGER; -- for ap-type nodes
Advantages:
- Minimum deployment drops to 2 ESP32 nodes + existing router — no dedicated TX hardware needed
- No TX stagger scheduling, no collision management
- Router transmits constantly and reliably — more stable than ESP32 probe packets
- Users can add dedicated TX nodes later for higher resolution, mixing passive + active links
Limitations:
- Router position is fixed — less geometric diversity than distributed TX nodes
- Beacon rate (~10 Hz) is lower than dedicated TX (20–50 Hz) — lower temporal resolution
- Some routers use beamforming that varies CSI per-frame — may need filtering
Firmware: Single passive_bssid NVS config field. When set, the CSI callback filters peer_mac == passive_bssid instead of accepting all peers.
11. Live Coverage Painting & GDOP Overlay
When placing or repositioning nodes in the 3D setup view, the ground plane dynamically displays a color-coded coverage quality map. As a node is dragged, the visualization updates in real-time.
GDOP (Geometric Dilution of Precision) computation:
GDOP quantifies how well a set of link geometries can localize a point. For CSI-based localization, the relevant metric is the angular diversity of links covering a cell — links from different directions provide more independent information.
2D GDOP formula per cell:
For a cell at position P:
1. Collect all links (TX_i → RX_i) where P is within the first 3 Fresnel zones of that link
(i.e., ΔL = |P-TX_i| + |P-RX_i| - |TX_i - RX_i| ≤ 3·λ/2)
2. If fewer than 2 qualifying links: GDOP = Infinity (gray cell, no coverage)
3. For each qualifying link i: θ_i = atan2(RX_i.y - TX_i.y, RX_i.x - TX_i.x) (projected to floor plane)
4. Build the 2×2 Fisher information matrix:
F = Σ_i [ [cos²(θ_i), cos(θ_i)·sin(θ_i)],
[cos(θ_i)·sin(θ_i), sin²(θ_i) ] ]
5. det_F = F[0][0]·F[1][1] - F[0][1]·F[1][0]
6. If det_F ≤ 1e-6: GDOP = Infinity (collinear links — degenerate geometry)
7. trace_Finv = (F[0][0] + F[1][1]) / det_F (trace of F^-1 using 2×2 inverse formula)
8. GDOP = sqrt(trace_Finv)
Thresholds: GDOP < 2 = excellent (green), 2–4 = good (yellow), >4 = poor (red), Infinity = gray.
Coverage score: Fraction of floor cells with GDOP < 4, expressed as a percentage (0–100%).
Implementation (Web Worker):
- Input:
{grid: {width, height, cell_m: 0.2, origin: [x, y]}, links: [{tx: [x,y,z], rx: [x,y,z]}, ...], lambda: 0.123} - Output:
Float32Arrayof GDOP values indexed as[col + row × width]. Infinity encoded as 9999. - Computation: nested loop over grid cells × links, O(cells × links). For 50×50 cells and 28 links: ~70,000 iterations = <2 ms.
- Main thread creates
THREE.DataTexture(output, width, height, THREE.RedFormat, THREE.FloatType)and applies a shader with color mapping (green→yellow→red gradient, gray for GDOP=9999). - Update trigger: send message to Worker on every
requestAnimationFrameduring node drag. Worker responds within one frame. No throttling needed given <2 ms compute time.
Coverage painting during node placement:
- When a node is being dragged via TransformControls, the link geometry changes on every frame. The Web Worker recomputes GDOP on every animation frame during drag.
- The GDOP overlay updates live — dead zones visibly shrink as the node moves into a good position
- A "coverage score" percentage is displayed in a HUD element: "Coverage: 78% ↑3%" (arrow shows improvement since drag started)
- Color legend shown in corner: green excellent / yellow good / red poor / gray no coverage
Virtual node planning:
- "Add Virtual Node" button places a phantom node (wireframe, dashed links) that participates in GDOP computation but doesn't correspond to real hardware
- User drags the virtual node around to find the optimal position for their next purchase
- Virtual nodes are visually distinct (translucent, pulsing outline) and can be converted to real nodes via onboarding
12. Self-Healing Fleet
When a node goes offline, the fleet manager automatically re-optimizes roles among remaining nodes to maintain the best possible coverage, rather than simply degrading silently.
Healing sequence:
- Node WebSocket disconnects → fleet manager marks it OFFLINE
- Recompute GDOP across the floor grid with the reduced node set
- Select optimal TX/RX role assignments among remaining nodes to minimize worst-case GDOP
- If passive radar mode is active, check if remaining RX nodes still have adequate geometric diversity against the AP
- Adjust packet rates upward on remaining TX nodes to compensate for lost link density
- Push new role configs to remaining nodes via their WebSocket connections
- Dashboard shows a before/after coverage comparison overlay: "Node kitchen-ceiling went offline. Coverage in kitchen corridor degraded from excellent to fair. 4 links lost, 8 remaining."
Recovery:
- When a node reconnects (WebSocket
hello), roles are re-optimized again to restore full coverage - Dashboard shows: "Node kitchen-ceiling back online. Full coverage restored."
- Node automatically receives its new role assignment on reconnect
Graceful degradation guarantees:
- 1 node lost from a 6-node fleet: coverage degrades but system remains functional
- 2 nodes lost: system warns "significant coverage gaps" with affected areas highlighted in red in the 3D view
-
50% fleet offline: system enters degraded mode, disables spatial localization, falls back to per-link presence detection only
13. Room Transition Portals
Portals are vertical planes drawn across doorways in the 3D editor. They track directional blob crossings to maintain per-room occupancy counts.
Portal definition:
- In setup mode, user draws a vertical rectangle across a doorway by clicking two floor points (the portal spans floor to ceiling)
- Each portal connects two named zones (e.g., "Hallway" ↔ "Kitchen")
- Portals are rendered as translucent colored rectangles in the 3D view
Portal plane representation: Each portal is stored as two floor points [P1, P2]. The portal plane's normal vector n = normalize(cross([P2-P1, 0, 0], [0, 1, 0]))(horizontal normal, pointing from zone_a to zone_b). The plane equation:f(P) = dot(P - portal_midpoint, n)`.
Crossing detection (two-phase, committed-only):
Phase 1 — Tentative crossing:
- For each tracked blob, evaluate
f(blob_pos)every fusion tick (10 Hz) - Sign change detected:
prev_sign != sign(f(blob_pos)) - Minimum velocity check:
dot(blob_velocity, n)must be > 0.1 m/s in the crossing direction (prevents jitter near portal from static blob repositioning) - On sign change + velocity check: record a tentative crossing in memory (direction, timestamp, blob_id). Do NOT yet update occupancy counts.
Phase 2 — Committed crossing:
- Committed when blob is > 0.3 m past the portal plane on the new side for >2 s (dwell confirmation)
- On commit: insert into
portal_crossingstable; update zone occupancy counts - Reversal window: if blob returns through the portal within 5 s of a tentative crossing (before commit), cancel the tentative crossing. Log: "Blob #N passed tentatively through [portal] but returned."
- Counts are bounded: minimum 0 (committed crossing out of an empty room sets count to 0, not -1)
Occupancy tracking and restart reconciliation:
- Per-zone occupancy count maintained in memory:
{zone_id: count} - Persisted to
zones.last_known_occupancycolumn in SQLite every 60 s and on graceful shutdown - On mothership restart:
- Load
last_known_occupancyas the starting value for each zone (marked "uncertain") - Compute the net portal crossings since midnight from the
portal_crossingstable:SELECT direction, zone_a_id, zone_b_id FROM portal_crossings WHERE timestamp_ms > midnight_ms ORDER BY timestamp_ms - Apply net crossings to the loaded starting values → reconstructed occupancy
- Mark occupancy as "reconciled" after 60 s of live operation
- Load
- Dashboard shows "Occupancy estimates restored after restart (may be stale)" until next reconciliation
- 60-second reconciliation: every 60 s, compare portal-based occupancy with blob-count-per-zone. If they differ by >1 for 2 consecutive checks, apply the blob-count-per-zone as ground truth and log the discrepancy
- Dashboard shows zone labels in the 3D view with occupancy badges: "Kitchen: 2", "Bedroom: 0"
- Zone occupancy published via the dashboard WebSocket and exposed via REST API
Portal flash animation: When a crossing is detected, the portal rectangle briefly flashes and an arrow appears showing the direction of travel.
Home Assistant integration: Zone occupancy exposed as sensor entities via optional MQTT client auto-discovery: sensor.spaxel_kitchen_occupancy.
14. Time-Travel Debugging
The mothership continuously records raw CSI frames to a circular buffer, enabling historical replay with adjustable algorithm parameters.
Recording:
- All incoming CSI binary frames are written to a recording store (append-only file or SQLite blob table) with mothership timestamps
- Default retention: 48 hours. Configurable via dashboard settings
- Storage estimate: ~150 bytes/frame × 30 Hz × 20 links = ~7.5 MB/hour = ~360 MB/48h for an 8-node fleet
- Oldest data evicted automatically when retention limit reached
Replay engine architecture:
Replay runs as a separate, isolated pipeline instance — it does not interfere with live operation. The live fusion loop continues at 10 Hz regardless of replay state.
When POST /api/replay/start {from_iso8601, to_iso8601} is called:
- A new
replay_sessionsrow is created withstate='paused' - A dedicated goroutine (
replayWorker) is spawned for this session - The worker seeks to
from_msincsi_replay.binby binary-searchingrecv_time_msfields (linear scan forward fromoldest_pos; replay seeks are infrequent enough that O(N) is acceptable for the initial seek; subsequent seeks within a session are O(1) for forward seeks and O(N/2) average for backward seeks)
Seek algorithm in csi_replay.bin:
To seek to target_ms:
1. Start from oldest_pos (guaranteed to exist)
2. Read frames sequentially, comparing recv_time_ms to target_ms
3. When recv_time_ms >= target_ms: stop; current file position is the replay cursor
4. Store cursor in replay_sessions.current_ms
5. For frame-by-frame mode: advance cursor by exactly one frame
Playback at N× speed:
- The
replayWorkerreads frames from the file in recv_time_ms order - Real-time delta between consecutive frames:
real_dt = frame[i+1].recv_time_ms - frame[i].recv_time_ms - Worker sleeps
real_dt / speedms between pushing each frame through the replay pipeline - The replay pipeline is a copy of the live signal processing pipeline with the session's
params_jsonapplied (or live params ifparams_json = NULL) - Output: blob list from the replay pipeline is pushed to
replay_resultschannel (in-memory, not SQLite) - Dashboard receives replay blobs via the dashboard WebSocket: when a session is playing, the mothership interleaves
{"replay":true, "blobs":[...], "timestamp_ms":N}frames into the dashboard feed alongside live frames. The frontend distinguishes replay frames by thereplay:trueflag and renders them in the 3D scene's replay layer
Seek (POST /api/replay/seek {session_id, timestamp_iso8601}):
- Pauses playback; re-runs the seek algorithm; updates
replay_sessions.current_ms; sends a single frame to the dashboard at the seeked timestamp (one-shot replay tick)
PATCH /api/replay/params (parameter change):
- Updates
params_jsonin the session row - Triggers a replay "batch re-run": replays the current 60-second window around
current_msat maximum speed (no real-time delay), computes blobs, sends results to the dashboard as a burst. This gives the "instant preview" effect.
Dashboard toolbar: "Pause Live":
-
Clicking "Pause Live" calls
POST /api/replay/startwithfrom=now-60sandto=now. The dashboard freezes the live blob render and switches to replay mode. The 3D scene shows the state 60 seconds ago. The user can then scrub backward via the timeline. -
Dashboard toolbar: "Pause Live" button freezes the 3D view and reveals a timeline scrubber
-
Scrub backward/forward through recorded history. Playback at 1×, 2×, 5×, or frame-by-frame
-
The 3D scene renders blobs exactly as they were detected at that point in time, including trails
Parameter tuning overlay:
- While in replay mode, a tuning panel exposes key pipeline parameters as sliders:
- Detection threshold (deltaRMS)
- Baseline time constant (τ)
- Fresnel weight decay rate
- Subcarrier selection count
- Breathing band sensitivity
- Adjusting any slider re-runs the pipeline on the recorded CSI data with the new parameters
- The 3D view immediately shows how detection would have differed — missed blobs appear, false positives disappear
- "Apply to Live" button writes the tuned parameters to the running pipeline
Use cases:
- "Why did it miss me standing in the kitchen at 2pm?" — scrub back, lower stillness threshold, see the blob appear
- "Why do I get false positives at 3am?" — scrub to the event, raise the detection threshold until it disappears, check if real events are still caught
- Debug new node placements by replaying the first hour of data with different parameters
15. Diurnal Adaptive Baseline
Instead of a single EMA baseline per link, maintain a 24-slot circular buffer — one baseline vector per hour of day. This captures predictable environmental periodicity that a simple EMA cannot distinguish from human presence.
Sources of diurnal variation:
- HVAC cycling (on/off at scheduled times)
- Sunlight heating window glass and walls (changes propagation characteristics)
- Appliance EMI patterns (refrigerator compressor, washing machine)
- Household RF environment (neighbor's devices, microwave ovens)
Learning phase:
- For the first 7 days, the system builds per-hour baselines by accumulating motion-free CSI data into hourly slots
- Dashboard shows a "baseline confidence" indicator per link that fills up as each hourly slot accumulates sufficient samples (minimum 300 samples = 5 minutes of quiet data per slot)
- Slots that haven't been calibrated fall back to the global EMA baseline
Steady state:
On each fusion tick (10 Hz), the active baseline for a link is computed as a weighted blend:
hour = current_hour_of_day (0–23, in configured TZ)
diurnal_slot = diurnal_baselines[link_id][hour]
// Use diurnal slot only if it has enough samples
if diurnal_slot.sample_count >= 300:
// Crossfade: blend over the first 15 minutes of the new hour
// minute_of_hour in [0, 60), crossfade completes at minute 15
t = min(1.0, minute_of_hour / 15.0) // linear 0→1 over 15 min
active_baseline[k] = (1-t) × ema_baseline[k] + t × diurnal_slot.amplitude[k]
else:
// Slot not ready — use global EMA
active_baseline[k] = ema_baseline[k]
The active_baseline[k] computed above replaces baseline[k] in the deltaRMS formula during this tick. The global EMA baseline continues to update in parallel (motion-gated, τ = 30 s) regardless of whether the diurnal slot is active — it serves as the fallback for any uncalibrated slot.
When crossfade weight t reaches 1.0 (after 15 minutes), the diurnal slot becomes the sole baseline for the current hour. Motion-gated EMA updates during this hour also update diurnal_slot.amplitude[k] (in addition to the global EMA), improving the slot over time.
- Motion-gated updates continue within each hourly slot — the diurnal baseline improves over time
- Dramatically reduces false positives from predictable environmental changes
Storage: 24 slots × N_subcarrier complex values × N_links. For a typical 8-node fleet with 28 links: 24 × 64 × 2 × 28 = ~86 KB per link set. Negligible.
Dashboard visualization: A 24-hour polar chart per link showing baseline amplitude variance by hour — spikes indicate noisy hours. Helps users understand their environment.
16. Fall Detection
Detects falls by monitoring blob Z-axis trajectory and post-fall stillness. Designed for elderly or at-risk household members.
Detection algorithm:
- Track blob Z-coordinate velocity via the Kalman filter's state estimate
- Trigger condition: Z velocity exceeds −1.5 m/s (rapid descent) AND blob Z drops below 0.5 m within 1 second
- Confirmation: Blob remains below 0.5 m with low motion (deltaRMS below stillness threshold) for >10 seconds — person hasn't gotten back up
- Alert fired after confirmation window
Alert chain (configurable):
- Dashboard alarm — 3D view highlights the blob in red with pulsing animation, audible alert
- Webhook — POST to configurable URL (e.g., Home Assistant automation)
- Push notification — via Ntfy, Pushover, or Gotify (user configures endpoint)
- Escalation — if no manual dismissal within 5 minutes, fire a secondary webhook (e.g., send SMS via Twilio, notify emergency contact)
False positive management:
- Requiring the combination of rapid descent + sustained stillness + low Z is physiologically specific to falls
- "Lying on the couch" doesn't trigger because there's no rapid descent event
- "Picking something up from the floor" doesn't trigger because the person rises within the 10 s confirmation window
- User can dismiss an alert from the dashboard, which logs the event for tuning
- Sensitivity adjustable: confirmation window, Z thresholds, velocity threshold
False negative cases (accepted limitations):
- Falling onto a mattress directly on the floor (Z_surface ≈ 0) — the Z drop may be less than the velocity threshold if the person was already seated or crouching
- Falling in a zone with no nodes above 1.5 m — Z resolution is insufficient to detect the rapid descent
- Very slow falls (elderly person slowly sliding down a wall) — velocity may be below the 1.5 m/s threshold
- Falling in a chair (landing height ~0.5 m) — may not clear the Z < 0.5 m threshold
Mitigation for false negatives:
- Zone type metadata: Each zone can be marked with a type (default:
general; options:bedroom,bathroom,living,exercise,kitchen). Bedroom zones automatically suppress fall alerts during typical sleep hours (21:00–07:00) to avoid waking-up-from-bed false positives. Non-bedroom zones do not suppress. - At-risk mode: A per-zone option reduces the velocity threshold to −0.8 m/s and the confirmation window to 5 s. Intended for zones where an at-risk person spends most of their time.
- Peak velocity detection: The algorithm examines the Kalman filter's estimated Z velocity at each time step for the 3 seconds leading up to the low-Z event. If the peak downward velocity in this window exceeds the threshold (even if instantaneous velocity at detection time is lower), it's treated as a fall trigger.
- Manual report: The dashboard "I fell" button allows users to manually report a missed fall, which is logged and used to tune thresholds.
- Hardware advisory: The system checks if fewer than 2 nodes are placed above 1.5 m in the zone where fall detection is enabled. If so, it shows a persistent warning: "Fall detection in this zone requires at least 2 nodes above 1.5 m for reliable Z-axis resolution."
Zone type stored in the zones table: Add zone_type TEXT NOT NULL DEFAULT 'general' CHECK (zone_type IN ('general','bedroom','bathroom','living','exercise','kitchen','office','entry')) column to the zones table.
Requirements: Mixed-height node placement is essential for Z-axis resolution. Minimum 2 nodes at >1.5 m height and 2 at <0.5 m for reliable fall detection. The dashboard warns when this requirement is not met in zones where fall detection is enabled.
17. Pre-Deployment Simulator
Before purchasing hardware, users can define their space in the 3D editor, place virtual nodes, and run a physics-based simulation to see expected detection quality.
Space definition:
- Same 3D editor used for real setup — draw room boxes, set dimensions, add doorways
- Place virtual nodes (visually distinct ghost meshes) at candidate positions
- Optionally add wall segments with material properties (drywall, concrete, glass) that affect signal attenuation
Simulation engine:
- Simplified ray-based propagation model: direct path + first-order reflections off walls and floor/ceiling
- Compute expected CSI amplitude and phase for each virtual link
- Apply the same Fresnel zone localization algorithm used in live mode
- Generate synthetic "walkers" — virtual people that move along user-defined paths or random walk patterns
Visualization:
- GDOP overlay shows expected detection quality across the floor
- Simulated blobs track the virtual walkers, showing expected accuracy at each position
- Coverage gaps highlighted in red
- "Add another node here" suggestions based on worst-GDOP positions
Outputs:
- Minimum node count recommendation for the defined space
- Optimal positions for N nodes (greedy GDOP optimization)
- Expected accuracy estimate at each point in the space
- "Shopping list" — how many ESP32-S3 boards to buy
Propagation model (quantified):
The simulator computes expected received signal power for each TX→RX link at each walker position using a two-ray model (direct + single-bounce) in 2D.
Path loss model (log-distance):
PL(d) = PL_0 + 10·n·log10(d/d_0) [dB]
PL_0 = 40 dB at d_0 = 1 m (free-space reference)
n = 2.0 (free space, no walls between TX and RX)
Wall penetration loss (additive, per wall crossed on the direct path):
Material Loss (dB)
Drywall / wood 3
Brick / concrete 10
Glass 2
Metal 20
Default material when none specified: drywall (3 dB)
First-order reflection (single bounce off a flat wall segment):
Reflection coefficient: R = 0.3 (power; dimensionless)
Reflected path length: d_refl = |TX-P_reflect| + |P_reflect-RX|
where P_reflect is the specular reflection point on the wall segment
Power of reflected ray: P_refl = P_direct × R × PL(d_refl) / PL(|TX-RX|)
Only the strongest reflected ray is retained (weakest wall absorption material first)
Combined signal amplitude at walker position W:
amplitude(W) = sqrt(P_direct(W) + P_refl(W)) (coherent sum approximation)
Simulated CSI phase at W:
phase_k(W) = 2π × k × Δf × (d_direct(W) / c) for subcarrier k
where Δf = 312.5 kHz (HT20 subcarrier spacing), c = 3×10⁸ m/s
(single subcarrier phase model; sufficient for GDOP and presence simulation)
deltaRMS_sim(W) = |amplitude(W) - amplitude(empty_room)| / amplitude(empty_room)
(simulated signal change from "walker present" vs "empty room")
Walker motion model:
- Path-following mode: user draws a polyline in the 3D editor; walker follows at constant speed (default 1.0 m/s)
- Random-walk mode: walker moves with Gaussian velocity updates (σ = 0.5 m/s per axis per step), reflected off room walls
- Step interval: 100 ms (matches live 10 Hz fusion rate)
- When multiple walkers are present: each walker's amplitude contribution is summed (incoherent power addition)
Implementation: Reuses the same Fresnel/GDOP math from coverage painting (Component 11) and the same localization algorithm from the fusion engine (Component 4). The propagation model is the only new code — a simplified 2D ray tracer with the wall-penetration table above.
18. Spatial Automation Builder
Visual automation system where trigger conditions are defined as 3D volumes in the scene, wired to actions.
Trigger volumes:
- In setup mode, user draws 3D boxes (or cylinders) in the scene using TransformControls
- Each volume is named and assigned a condition:
- Enter: blob crosses into the volume
- Leave: blob exits the volume
- Dwell: blob remains inside for ≥ N seconds (configurable)
- Vacant: no blobs inside for ≥ N seconds
- Count: number of blobs inside crosses a threshold (e.g., ≥ 2 people in living room)
- Optional time constraint: "only between 22:00 and 06:00"
Actions:
- Webhook: POST/GET to configurable URL with JSON payload containing event details
- MQTT publish: To user's external broker (e.g., Home Assistant)
- Internal: Trigger re-baseline, change node roles, enable/disable fall detection for a zone
Visual feedback:
- Trigger volumes rendered as translucent colored shapes in the 3D live view
- When a condition is active, the volume pulses or changes color (e.g., green idle → amber triggered)
- Event log sidebar: "14:32:05 — Blob #2 entered 'Living Room Couch' zone, dwell timer started (30s)"
Example automations:
- "Dwell in hallway entrance for 0s → fire
person_homewebhook" (arrival detection) - "Vacant in all zones for 10 min → fire
house_empty" (departure detection) - "Enter bedroom + time 22:00–06:00 → fire
goodnightscene" - "Count ≥ 2 in dining room + dwell 5 min → fire
dinner_started"
Point-in-volume test:
- Box volume:
inside = (x ≥ v.x AND x < v.x+v.w) AND (y ≥ v.y AND y < v.y+v.d) AND (z ≥ v.z AND z < v.z+v.h)(axis-aligned bounding box; all comparisons in meters) - Cylinder volume:
inside = sqrt((x-v.cx)²+(y-v.cy)²) < v.r AND z ≥ v.z AND z < v.z+v.h shape_jsonfields: box ={type:"box",x,y,z,w,d,h}; cylinder ={type:"cylinder",cx,cy,z,r,h}
Per-trigger state machine (evaluated at 10 Hz):
For each enabled trigger T, for each tracked blob B (filtered by T.person if set):
inside = point_in_volume(B.pos, T.shape)
prev_inside = last tick's inside value for (T.id, B.id)
ENTER condition: fires once on transition prev_inside=false → inside=true
LEAVE condition: fires once on transition prev_inside=true → inside=false
DWELL condition: inside=true for ≥ T.duration_s continuously (timer per (T.id, B.id))
- timer starts when blob enters; resets when blob exits
- fires exactly once per entry; re-fires after blob leaves and re-enters
VACANT condition: no blob inside for ≥ T.duration_s
- timer starts when the last blob exits; fires when timer expires
- cancelled if any blob enters before the timer expires
COUNT condition: fires when blob count inside crosses T.count_threshold
- fires on rising edge only (count was < threshold, now ≥ threshold)
Time constraint check: If T.time_constraint_json is set ({from:"22:00",to:"06:00"}), the trigger only fires when current_local_time is within the range. Overnight ranges (from > to) are handled correctly.
Fire rate limiting: Each trigger (T.id, B.id, condition) has a last_fired timestamp. Minimum re-fire interval: ENTER/LEAVE = 5 s; DWELL = 60 s (after firing, must exit and re-enter before firing again); VACANT = 60 s. This prevents double-fires from jitter at zone boundaries.
Webhook action payload (POST to actions_json[i].url):
{
"trigger_id": 42,
"trigger_name": "Couch Dwell",
"condition": "dwell",
"fired_at": "2024-03-15T14:32:05Z",
"blob_id": 2,
"person": "Alice", // null if unidentified
"position": {"x": 2.1, "y": 3.4, "z": 0.9},
"zone": "Living Room", // zone whose bounds contain the trigger volume centroid; null if none
"dwell_s": 34 // for dwell condition: elapsed seconds; omitted for other conditions
}
HTTP timeout: 5 s. On timeout or 5xx: log warning, do not retry (fire-and-forget). On 4xx: log error, disable the trigger and show dashboard warning: "Webhook returned [status] — trigger disabled. Fix the URL and re-enable."
MQTT action: Publishes to T.actions_json[i].topic with the same JSON payload as the webhook (as a string). QoS 0. Requires SPAXEL_MQTT_BROKER to be configured.
Evaluation: Trigger conditions are evaluated in the mothership's fusion loop at 10 Hz — point-in-volume tests on already-tracked blob positions. Negligible computational cost.
19. Ambient Confidence Score & Link Weather
Continuous system-wide health monitoring that makes the RF environment legible to non-technical users.
Per-link health metrics and composite score:
Link ID format: "TX_MAC:RX_MAC" using uppercase colon-separated hex. For passive links (router as TX): "AP_BSSID:NODE_MAC". Links are directional — TX→RX1 and TX→RX2 are separate link IDs.
Link ID normalization (canonical form for storage): For symmetrical links in TX/RX or TX_RX mode (where both nodes can be either TX or RX depending on role), the link_weights and link health tables use a canonical non-directional form to avoid duplicating weights for A→B and B→A. Canonical form: min(MAC_a, MAC_b) + ":" + max(MAC_a, MAC_b) (lexicographic sort of the two MACs). For passive links (AP as TX), the AP BSSID is always the first component (AP cannot be RX). The function CanonicalLinkID(mac1, mac2 string) string applies this rule consistently throughout the codebase. In-memory CSI frames use the raw peer_mac:node_mac (directional) form for signal processing; lookup into link_weights always calls CanonicalLinkID.
Metric 1 — Packet Delivery Rate (PDR):
- For active TX nodes:
PDR = received_count / (configured_rate_hz × window_s)over a 30-second rolling window - For passive nodes: on first connect, measure the empirical beacon arrival rate over 60 s (called "warmup"). Use the measured rate as
expected_rate. Typical: ~10 Hz. During warmup, PDR is shown as "measuring..." in the UI. - Gap detection: if no frames arrive for >5× expected interval, the link is immediately marked DEGRADED (zero PDR) even within the window
- PDR is reset and re-measured after a node reconnects (30-second warmup window)
Metric 2 — SNR (Signal-to-Noise Ratio, 0–1):
SNR = mean(amplitude[k]) / std(amplitude[k])over a 10-second window, averaged over selected subcarriers- Normalized:
SNR_norm = min(1.0, SNR / 20.0)(SNR of 20 maps to 1.0 — a good link)
Metric 3 — Phase Stability (0–1):
phase_variance = variance(residual_phase[k])over a 10-second window, averaged over selected subcarriersphase_stability = max(0, 1.0 - phase_variance / 0.5)(variance of 0.5 rad² maps to 0.0)
Metric 4 — Baseline Drift (0–1, where 1 = no drift):
drift = L2_distance(current_baseline_amplitude, calibration_baseline_amplitude) / num_subcarriersdrift_score = max(0, 1.0 - drift / 5.0)(drift of 5.0 amplitude units maps to 0.0)
Composite link quality score (0–100):
quality = 100 × (0.35 × PDR + 0.30 × SNR_norm + 0.25 × phase_stability + 0.10 × drift_score)
The dashboard shows the composite score as the link's health indicator. Hovering over a link reveals a tooltip with all 4 component bars.
System-wide Detection Quality metric: mean(quality[link]) over all active links — simple unweighted mean. Active links are those with PDR > 0 and at least one frame in the last 30 s.
System-wide confidence:
- Aggregate all link quality scores into a single "Detection Quality" metric: 0–100%
- Displayed as a prominent gauge/ring in the dashboard toolbar
- Thresholds: 80–100% Excellent, 60–80% Good, 40–60% Fair, <40% Poor
3D visualization:
- Links rendered with thickness and color proportional to their health: thick green = strong, thin red = struggling
- Nodes with all links healthy: bright green. Nodes with degraded links: amber border
- Optional "link weather map" overlay: ground-plane heatmap showing detection confidence at each point, derived from link health of links whose Fresnel zones cross that point
Diagnostics and advice:
- When quality drops, the system diagnoses why: "Link kitchen↔hallway degraded — possible obstruction change" or "3 links showing correlated phase drift — environmental change detected"
- Links that have been below threshold >40% of the time over the past week are flagged with specific advice: "These nodes may be too far apart or have too many walls between them. Consider adding a relay node at [highlighted position in 3D]"
- Quality trends graphed over time (24h / 7d / 30d) to identify patterns
Anomaly detection integration (see Component 20): Sudden quality drops across multiple links outside of normal diurnal patterns can indicate significant environmental changes and trigger re-calibration suggestions.
20. Anomaly Detection & Security Mode
Learns normal occupancy patterns over time and alerts on deviations. Privacy-preserving intrusion detection with no cameras.
Pattern learning:
- After 7+ days of operation, the system builds a statistical model of typical occupancy:
- Per-zone, per-hour-of-day: expected occupancy (mean and variance)
- Per-zone, per-day-of-week: weekend vs. weekday patterns
- Typical first-detection time (morning wake-up) and last-detection time (bedtime)
- Common transition patterns (e.g., bedroom → bathroom → kitchen in morning)
- Model stored in SQLite, updated continuously with exponential decay (recent behavior weighted more)
Anomaly scoring:
Each detection event is scored at event time using the learned statistical model. Score is in [0, 1]. Three component scores are combined:
Z-score helper: z_score(observed, mean, std) = (observed - mean) / max(std, 0.5)
(floor std at 0.5 prevents division by zero and reduces sensitivity for small-sample slots)
[0,1] mapping: normalize(z) = min(1.0, max(0.0, (|z| - 1.0) / 3.0))
(0 below 1σ deviation, rises to 1.0 at 4σ deviation)
Time score: Fraction of historical observations in this zone-hour-day slot that had ANY detection.
time_score = normalize(z_score(is_active, slot_mean_active, sqrt(slot_mean_active * (1 - slot_mean_active))))
where is_active = 1 if detected, 0 if not; slot_mean_active is the historical fraction.
Zone count score: How unusual is the blob count vs. historical?
count_score = normalize(z_score(observed_count, slot_mean_count, sqrt(slot_variance_count)))
Zone score: Detection in a zone that is atypically occupied at this time.
(For the primary zone of the detected blob, same formula as time_score but for that specific zone)
Composite: anomaly_score = max(0.4×time_score + 0.4×count_score + 0.2×zone_score, max(time_score, count_score))
(takes the larger of the weighted sum and the max component — ensures individual extreme components are not hidden)
- Alert threshold: anomaly_score > 0.85 → fire alert (yellow if 0.6–0.84, red if ≥ 0.85). Configurable.
- Security mode: overrides scoring — any detection = score 1.0 (all motion is suspicious)
- "Vacation mode" toggle: suppresses anomaly alerts but doesn't disable monitoring or learning
Model update rule (Welford's online algorithm applied to anomaly_patterns table):
Once per hour, the mothership records an observation for each zone × day_of_week: observed_count = blob_count_in_zone_during_this_hour. The anomaly_patterns row is updated:
// Welford's online update for running mean and variance
// Fields: mean_count, variance (= M2/n, Bessel-corrected), sample_count
n_new = row.sample_count + 1
delta = observed_count - row.mean_count
mean_new = row.mean_count + delta / n_new
delta2 = observed_count - mean_new // second delta after mean update
M2_old = row.variance × row.sample_count // un-normalize variance
M2_new = M2_old + delta × delta2 // Welford M2 accumulator
variance_new = M2_new / n_new // population variance (not sample)
UPDATE anomaly_patterns SET
mean_count = mean_new,
variance = variance_new,
sample_count = n_new,
updated_at = now_ms
WHERE zone_id = ? AND hour_of_day = ? AND day_of_week = ?
For the slot_mean_active used in time_score: stored separately as mean_count (treat observed_count as binary 1/0 — was zone ever occupied during this hour). The variance column encodes variance_count for the count_score formula.
Update period: once per hour (not per fusion tick — avoids massive SQLite write load). The update for a given zone-hour-day slot happens at the end of each calendar hour.
- Cold start: model marked as "not ready" for the first 7 days. No anomaly alerts until the model has at least 50 observations per active slot.
- Outlier protection: model is only updated when
anomaly_score < 0.5for the event (prevents learning from anomalous events themselves) - Anomaly score stored in
events.detail_jsonfor later retrieval in explainability overlay
Security mode:
- User-activatable from dashboard or via automation trigger (e.g., "vacant in all zones for 10 min → enable security mode")
- In security mode, ANY detection event fires an alert (no anomaly threshold — all motion is suspicious)
- Alert chain: same as fall detection (dashboard alarm → webhook → push notification → escalation)
- "Away" mode can be automatically activated when correlated with phone geofencing (user configures via HA integration)
Dashboard visualization:
- Timeline view showing expected vs. actual occupancy patterns
- Anomaly events highlighted with severity color (yellow = unusual, red = highly anomalous)
- "Normal pattern" overlay in 3D view: faint blob trails showing typical movement patterns for the current hour
Privacy: No personally identifiable data is stored — only statistical occupancy counts and zone transition frequencies. No recording of who is where, only that someone is/isn't.
21. BLE Beacon Scanning & Device Registry
The ESP32-S3 has a BLE radio that runs concurrently with WiFi on the second core. Each node passively scans for BLE advertisements and reports them to the mothership, enabling person/device identification of tracked blobs.
BLE scanning (firmware):
- Passive BLE scan runs continuously on Core 0 (WiFi CSI runs on Core 1)
- Captures: device address, address type (public/random), RSSI, device name (if advertised), manufacturer data
- Handles rotating random addresses via heuristic matching (see below) — passive scanning cannot resolve IRK without pairing
- Reports every 5 s as JSON on the WebSocket:
{type: "ble", devices: [{addr, rssi, name, type}, ...]}
BLE Address Rotation Handling:
Modern smartphones rotate their BLE random address every 15–30 minutes (iOS: ~15 min; Android: varies, often longer). Since Spaxel uses passive scanning (no pairing), IRK-based address resolution is not possible. The following heuristic algorithm is used:
Rotation detection heuristics (applied in the mothership on received BLE reports):
-
Manufacturer data fingerprint: The first 4 bytes of manufacturer data (after company ID) form a device fingerprint for Apple and Google devices. For Apple Continuity (company ID 0x004C, type 0x0F): extract the 2-byte proximity UUID. This UUID is stable across rotations for the same device in the same pairing context. Match new addresses with this fingerprint.
-
Time + signal proximity: When a known address disappears and a new unknown address appears at the same node within 90 seconds with similar RSSI (within 10 dBm), a rotation match is scored. Score =
0.5×manufacturer_match + 0.35×rssi_proximity + 0.15×time_gap_factor. If score > 0.7, the new address is tentatively linked to the old device. -
Position continuity: If a new BLE address appears at a node, and the closest blob is already associated with a known registered device, the new address is tentatively linked to that device's label.
-
Merge confirmation: After 3 consecutive reports under the new address matching the above criteria, the device is updated to the new address in the registry.
Multi-address registry: A ble_device_aliases table stores historical addresses per device:
CREATE TABLE IF NOT EXISTS ble_device_aliases (
addr TEXT NOT NULL, -- the alias/rotated address
canonical_addr TEXT NOT NULL REFERENCES ble_devices(addr) ON DELETE CASCADE,
first_seen INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
last_seen INTEGER NOT NULL,
PRIMARY KEY (addr)
);
This allows the mothership to recognize any historical address for a registered device, even after rotation.
Graceful fallback when rotation is unresolved: If no rotation match is found within 5 minutes of the known address disappearing, the blob that was associated with that device retains its identity label for an additional 5 minutes (estimated persistence) before reverting to "Unknown". This prevents a 15-second lapse in identity during normal rotation.
Practical recommendation: For the most reliable person identification, use a dedicated BLE tracker tag (e.g., Tile, Samsung SmartTag, or an AirTag-compatible tag) rather than a phone. Tracker tags typically have stable addresses with rotation periods measured in hours or not at all.
User-facing indicator: Devices with the auto_rotate flag set show a rotation icon in the BLE device registry. Hovering shows: "This device uses rotating addresses. Identity may lapse briefly (~60s) during rotation."
Device registry (SQLite):
| Column | Content |
|---|---|
| BLE address | Hardware or resolved address (primary key) |
| Label | User-assigned name: "Alice", "Bob's Watch", "Dog Tracker", "Car Keys Tag" |
| Type | person / pet / object — affects how the blob is rendered and tracked |
| Color | User-chosen color for the 3D figure/marker |
| Icon | Optional icon (for simple mode cards) |
| First seen | Timestamp |
| Last seen | Timestamp |
| Auto-rotate | Whether this device uses rotating addresses (detected automatically) |
Discovery & registration flow:
- Dashboard shows a "People & Devices" panel listing all BLE devices seen by any node in the fleet
- Unregistered devices appear in a "Discovered" list, sorted by frequency of sighting (household devices will be near the top)
- User taps a device → assigns a label, type, and color → device is registered
- Common devices are identified automatically: "iPhone", "Apple Watch", "Fitbit", "Tile" from manufacturer data
- User can also pre-register by BLE address if they know it (e.g., from a tracker tag's settings app)
Blob-to-device matching:
Run once per fusion tick (10 Hz) for each registered BLE device currently visible (seen in the last 10 s by at least one node).
For each registered device D (addr or alias in ble_devices / ble_device_aliases):
1. Collect RSSI reports: {node_i → rssi_i} from the last BLE scan batch (≤5 s old per node)
2. Estimate device position using the BLE centroid formula (Component 22):
pos_ble = RSSI-weighted centroid of reporting nodes
ble_confidence = min(1.0, (K-1)/3.0) where K = reporting node count
3. If ble_confidence < 0.33 (only 1 node reporting): assign to the nearest blob
to that single node's position, IF distance < 3.0 m. Otherwise: no match.
4. If ble_confidence >= 0.33: find the blob nearest to pos_ble:
nearest_blob = argmin_blob { |blob.pos - pos_ble| }
match_distance = |nearest_blob.pos - pos_ble|
5. Match threshold: accept match if match_distance < max(1.5, 2.0 × ble_confidence_m)
where ble_confidence_m = estimated BLE position uncertainty in meters:
ble_confidence_m = 3.0 / (K × 0.5) (heuristic: improves with more nodes)
Typical: K=2 → accept within 3.0 m; K=4 → accept within 1.5 m
6. Conflict resolution (two devices match the same blob):
- Compute match_score = (1.0 - match_distance/3.0) × ble_confidence for each device
- The device with the higher match_score wins; the other is unmatched this tick
- Unmatched device retains its previous blob assignment for up to 5 s (identity persistence)
7. On successful match: assign device.label, device.color, device.type to blob
If device.type == 'person' AND multiple devices for same person:
Use the device with the highest match_score for that person
Identity persistence: When a blob's BLE device was matched in the last tick but is not matched this tick (device went quiet or rotated address), the identity label is retained for up to 5 s (50 ticks). After 5 s without a fresh match, the label is cleared.
- Multiple devices can map to the same person (Alice's phone + Alice's watch both resolve to "Alice")
Privacy considerations:
- BLE scanning is local only — no data leaves the mothership
- Users control which devices are tracked — unregistered devices are ignored for identity matching
- "Visitor" devices (seen briefly, never registered) are not stored beyond the 5 s scan window
22. Self-Improving Localization via BLE Ground Truth
BLE person identification (Component 21) creates continuous, automatic ground truth that drives a feedback loop to improve CSI localization accuracy over time.
The feedback loop:
- BLE RSSI from multiple nodes estimates a device's approximate position (RSSI-weighted centroid — see formula below)
- The CSI localizer independently estimates the blob's position via Fresnel zone fusion
- The discrepancy between BLE-estimated position and CSI-estimated position is the error signal
- A gradient update adjusts per-link Fresnel zone weights: links whose Fresnel zones correctly predicted the BLE-confirmed position get reinforced, misleading links get dampened
BLE position estimation formula:
For a registered device currently seen by K nodes (K ≥ 2), with each node i reporting RSSI rssi_i (dBm):
// Convert RSSI to linear power weight (avoids negative weights from dBm)
// Shift so that the best possible RSSI (−30 dBm) maps to weight 1.0
weight_i = pow(10.0, (rssi_i - (-30.0)) / 10.0) // = 10^((rssi_i+30)/10)
// Example: rssi=-50 → weight=0.01; rssi=-70 → weight=0.0001
// Clamp minimum weight: if weight_i < 1e-6, exclude that node from the sum
pos_ble.x = sum(weight_i × node_i.pos_x) / sum(weight_i)
pos_ble.y = sum(weight_i × node_i.pos_y) / sum(weight_i)
// Z-axis: use average of K nodes' Z positions weighted by weight_i (coarser estimate)
pos_ble.z = sum(weight_i × node_i.pos_z) / sum(weight_i)
// Confidence of BLE estimate:
ble_confidence = min(1.0, (K - 1) / 3.0) // 0 for K=1, 0.33 for K=2, 0.67 for K=3, 1.0 for K≥4
The BLE position estimate is only used as ground truth when ble_confidence ≥ 0.33 (at least 2 reporting nodes) AND |pos_ble - pos_csi| < 2.0 m (outlier rejection — large discrepancies indicate a BLE-to-blob mismatch, not a localization error). When either condition fails, the frame is skipped (no weight update).
Weight update rule:
- Per-link weight vector
w[link](initialized to 1.0, range [0.1, 3.0]) - Each frame where BLE estimate is valid: for each link active in this fusion tick (deltaRMS > threshold):
pos_error = |pos_csi - pos_ble| // Euclidean distance in meters
// fresnel_contribution = fraction of the blob's Fresnel accumulation weight from this link
fresnel_contribution[link] = (link.deltaRMS × zone_decay) / total_accumulated_weight
// positive update if prediction is correct (small error), negative if not
error_signal = 1.0 - min(1.0, pos_error / 1.0) // 1.0 at 0m error, 0.0 at ≥1m error
w[link] += α × (error_signal - 0.5) × fresnel_contribution[link]
// note: (error_signal - 0.5) is positive when error < 0.5m, negative when error > 0.5m
- α = 0.001 (very slow learning rate to prevent instability)
- Weights clamped to [0.1, 3.0] range
- Stored in SQLite, restored on restart
- Update runs at most once per second (throttled) regardless of fusion rate
What improves:
- Links obscured by unexpected reflections get dampened automatically
- Links with clean Fresnel geometry get amplified
- The system adapts to the specific RF environment (wall materials, furniture layout) without manual tuning
- After 2-4 weeks of BLE-carrying occupants, localization accuracy measurably improves
Dashboard:
- "Accuracy Trend" graph showing median localization error (BLE-vs-CSI discrepancy) over time — should show a downward curve
- Per-link weight visualization: link thickness in 3D view reflects learned weight (thicker = more trusted)
- Reset button to reinitialize all weights to 1.0 (useful after major furniture changes)
23. Presence Prediction & Pre-emptive Automation
Learns per-person temporal patterns (requires BLE person identification) and predicts zone transitions 5–30 minutes in advance.
Pattern model:
For each person × zone × time_slot × day_type, the model stores probability = P(person is in this zone during this time slot on this day type). This is a marginal presence probability, not a transition probability — simpler to compute and sufficient for the 15-minute horizon predictions used.
- Time slot index:
slot = (hour × 60 + minute_of_day) / 15, range 0–95 (96 per day) - Day type:
weekday(Mon–Fri) orweekend(Sat–Sun), determined using theTZtimezone - Model state: stored in
prediction_modelstable:probability,sample_countper (person, zone_id, slot, day_type)
Model update rule:
- Observation: every 5 minutes, for each person with a known current zone, record an observation:
obs[zone] = 1.0,obs[other_zones] = 0.0for the current time slot and day_type - EMA update:
p_new = p_old + α × (obs - p_old)whereα = 0.03(≈ 1/33 observations ≈ 14-day half-life at ~2.5 observations/day/slot) sample_count += 1(used for cold-start gating)- Updates are processed in a background goroutine every 5 minutes (not in the fusion loop)
- Also applied retroactively on restart: the last 24h of zone events from the events table are replayed to recover model state lost during downtime
Cold start:
- A slot is "ready" when
sample_count ≥ 3(at least 3 observations across ≥ 3 different days) - "Days complete" for the dashboard progress indicator: count distinct calendar dates with observations for this person
- Before 7 days / 3 observations per slot: dashboard shows "Learning Alice's patterns... 4/7 days complete"; no
predicted_entertriggers fire
Prediction engine:
- Every 60 s: for each person with a current zone assignment, compute predictions for horizons 5, 15, 30 min
- Look up
probability[zone_id][current_slot + H/15][day_type]for each zone - Normalize probabilities over all zones to sum to 1.0
- Only output predictions for zones with probability > 0.5 (suppress low-confidence outputs)
- "Alice: 87% Kitchen by 7:20, 12% Bathroom, 1% other" — only Kitchen (0.87) and Bathroom (0.12) shown; "other" is the remainder
- Predictions recalculated every 60 s
predicted_enter trigger:
- Fires when
P(person in zone Z at T+H) > 0.6AND previous computation hadP < 0.6(rising edge only) - Suppressed: once fired, does not re-fire for the same (person, zone, slot) within 60 minutes
- Configurable: threshold (default 0.6), horizon H in minutes (5, 15, or 30)
Accuracy tracking:
- Every actual zone entry is compared against the prediction for that (person, zone, slot) made 15 minutes earlier
- Hit: actual zone matched the predicted top-1 zone. Miss: it didn't.
- Rolling 30-day accuracy:
hits / (hits + misses) - Displayed in dashboard: "Alice's predictions: 78% accurate at 15-min horizon (last 30 days)"
Exposed as:
- Dashboard widget: "Predicted Next 30 min" panel showing per-person expected zone with confidence bars
- REST API:
GET /api/predictions?person=alice&horizon=30m→ JSON probability distribution - Automation triggers (Component 18 extension): New trigger type
predicted_enter— fires N minutes before the predicted zone entry. Example: "5 min before Alice's predicted Kitchen entry → POST to kettle webhook" - Home Assistant sensors:
sensor.spaxel_alice_predicted_zone,sensor.spaxel_alice_prediction_confidence
Cold start: Predictions require 7+ days of data per person. During the learning phase, the dashboard shows "Learning Alice's patterns... 4/7 days complete" with a progress indicator.
24. Adaptive Sensing Rate with Edge Filtering
Dynamically adjusts CSI capture rate per link based on activity, reducing bandwidth by 90%+ during idle periods.
Two-tier sensing:
| State | CSI Rate | Bandwidth | Behavior |
|---|---|---|---|
| Idle | 2 Hz | ~600 B/s per link | On-device amplitude variance check. If variance > threshold → switch to Active |
| Active | 20–50 Hz | ~6–15 KB/s per link | Full CSI streaming to mothership. If no motion for 10 s → switch to Idle |
Mothership-controlled rate changes:
- Mothership sends
{type: "config", rate: 50}or{type: "config", rate: 2}on the WebSocket - Rate decisions based on: per-link deltaRMS, adjacent-zone activity (if motion in kitchen, preemptively ramp hallway links), prediction engine output (ramp before predicted arrivals)
On-device edge filtering (ESP32):
- At idle rate (2 Hz), firmware computes amplitude variance over last 5 samples (~20 lines of C)
- If variance exceeds a configurable threshold: immediately ramp to full rate, send a
{type: "motion_hint"}JSON to the mothership - Mothership uses motion hints to ramp adjacent links preemptively
Fleet-level coordination:
- When all zones are idle, designate one "sentinel" link per zone at 5 Hz; all others drop to 1 Hz
- When activity detected in one zone, ramp that zone to full rate + adjacent zones to 5 Hz
- Prediction engine can preemptively ramp zones before predicted arrivals
Benefits:
- 8-node fleet idle: ~4.8 KB/s total (vs ~120 KB/s at full rate)
- Battery-powered nodes become viable (deep sleep between 2 Hz samples)
- Mothership CPU load drops proportionally during quiet hours
- Scales to larger fleets without linear bandwidth growth
25. Sleep Quality Monitoring
Analyzes breathing band and motion data in bedroom zones during nighttime hours to produce a daily sleep quality report.
Activation: Automatic when a bedroom zone (user-designated) has been occupied with low motion for >15 minutes during configured nighttime hours (default 21:00–09:00).
Sleep state machine (per-zone, per-person-or-zone-occupant):
INACTIVE
└─▶ (zone first becomes occupied during nighttime window)
└─▶ IN_BED
bed_time = now
onset_latency timer starts
IN_BED
└─▶ (smooth_deltaRMS < 0.03 for 5 consecutive minutes)
└─▶ ASLEEP
onset_latency_min = now - bed_time
restless_event_count = 0; breathing_rate_samples = []
ASLEEP
├─▶ (smooth_deltaRMS > 0.08 for > 30 s but zone is still occupied)
│ └─▶ RESTLESS (transient)
│ restless_event_count += 1
│ duration of restless episode tracked (< 5 min: returns to ASLEEP)
│
├─▶ (smooth_deltaRMS > 0.08 for > 5 min, zone still occupied)
│ └─▶ AWAKE_IN_BED (night waking)
│ short_wake_count += 1
│ Returns to ASLEEP if motion stops within 15 min
│
└─▶ (zone becomes unoccupied or nighttime window ends)
└─▶ FINAL (record committed to sleep_records)
wake_time = now
Metric definitions:
- Time in bed:
wake_time - bed_time(minutes) - Sleep onset latency:
onset_latency_min= time from IN_BED to first ASLEEP transition - Wake time: timestamp of final zone-exit or 09:00 if still in zone
- Restlessness index:
min(5.0, restless_event_count / (time_in_bed_h))— events per hour, capped at 5.0- A "restless event" = smooth_deltaRMS > 0.08 for > 30 s while in ASLEEP state
- Breathing rate: For each 30-minute window during ASLEEP state: compute FFT over the last 600 samples (30 s at 20 Hz), find dominant peak in 0.1–0.5 Hz band, convert to bpm. Append to breathing_rate_samples.
breathing_rate_avg = mean(breathing_rate_samples). - Breathing regularity:
coefficient_of_variation = std(breathing_rate_samples) / mean(breathing_rate_samples). Low CV (< 0.10) = regular; high CV (> 0.25) = irregular. - summary_json: Array of 30-minute bucket objects
[{"t":"23:00","state":"asleep","bpm":14.2,"restless":false}, ...]for the weekly-trends chart.
Multi-person bedroom edge case: If two blobs are tracked in a bedroom zone simultaneously, the system assigns the sleep record to the BLE-matched person if available, otherwise creates two separate zone-based records (one per occupant slot). Breathing analysis uses the blob with the strongest stationary signal (lowest smooth_deltaRMS).
Dashboard:
- Morning summary card (simple mode): "Sleep 11:23pm – 7:02am (7h 39m). Restlessness: Low. Breathing: Regular."
- Weekly trends (expert mode): Charts of sleep duration, onset time, restlessness index, breathing rate over 7/30 days
- Anomaly flagging: "Breathing rate elevated last night (22 bpm vs. 16 bpm average)" — could indicate illness, stress, or environmental change
Per-person tracking: When BLE identifies who is sleeping, the report is per-person. If BLE is not configured, the report is per-zone (assuming single occupancy).
Privacy: No spatial tracking data is stored for sleep analysis — only aggregate motion/breathing statistics. A bedroom zone can have spatial privacy enabled while still collecting sleep metrics.
Storage: ~200 bytes per night per person. Negligible.
26. Crowd Flow Visualization
Aggregates blob trajectories over time into a directional flow map showing how a space is actually used.
Data accumulation:
Every fusion tick (10 Hz), for each tracked blob with confidence > 0.3:
cell_x = floor(blob.x / SPAXEL_GRID_CELL_M)
cell_y = floor(blob.y / SPAXEL_GRID_CELL_M)
For each active bucket_type in {'hour', 'day', 'week'}:
bucket_ms = floor(now_ms / bucket_duration_ms) × bucket_duration_ms
where bucket_duration_ms = 3_600_000 (hour) | 86_400_000 (day) | 604_800_000 (week)
UPSERT crowd_flow (bucket_ms, bucket_type, cell_x, cell_y):
entry_count += 1
vx_sum += blob.vx (m/s; can be negative)
vy_sum += blob.vy
dwell_ms += 100 (one 10 Hz tick = 100 ms)
entry_count counts tick-frames (not transitions into a cell). To display "average velocity direction" for a cell: vx_avg = vx_sum / entry_count, vy_avg = vy_sum / entry_count. Arrow rendered only when sqrt(vx_avg²+vy_avg²) > 0.05 m/s (suppress stationary dwell cells from arrow layer).
Memory accumulator: an in-memory map[bucketKey]CrowdCell is flushed to SQLite every 60 s (batch UPSERT). bucketKey = (bucket_ms, bucket_type, cell_x, cell_y). Stale in-memory entries (bucket_ms older than the current bucket boundary) are flushed and evicted.
- Accumulated into configurable time buckets: 1 hour, 1 day, 1 week
- Storage: ~30 KB per time bucket for a 50×50 grid
3D rendering (toggle-able layer):
- Flow arrows: Animated arrows along major movement corridors.
TubeGeometryalong spline paths fitted to high-traffic cell sequences. Width proportional to traffic volume. Color by average speed (blue = slow, red = fast). Arrows animate in the direction of travel - Dwell hotspots: Cells with high dwell time rendered as warm-colored pools on the ground plane (the couch, the desk, the kitchen counter). Intensity proportional to total dwell hours
- Time filter: Slider or dropdown to show flow for specific periods: "Morning routine (6–9am)", "Evening (6–10pm)", "Last 24 hours", "Last 7 days"
- Per-person filter: When BLE identity is available, show flow for a specific person: "Alice's typical paths" vs "Bob's typical paths"
Use cases:
- Understand furniture layout effectiveness ("everyone walks around the coffee table — move it")
- Identify most/least used areas for node placement optimization
- Commercial applications: retail foot traffic, office utilization
- Visualize the household "desire paths" — patterns emerge over days
Implementation: A 2D histogram accumulator updated per frame from blob positions. Arrow rendering uses Three.js TubeGeometry along spline paths. The accumulator runs in the mothership's fusion loop with negligible cost (one grid-cell increment per blob per frame).
27. Activity Timeline (Universal Navigation)
A single scrollable, filterable, searchable timeline that contains every event the system has ever observed. This replaces scattered views and separate log pages with one unified stream that serves as the primary way to navigate both time and space.
Event types (all in one stream):
- Detections: blob appeared, blob disappeared, blob entered/left zone
- Person events: "Alice entered Kitchen", "Bob left the house"
- Zone transitions: portal crossings with direction
- Automation triggers: trigger fired, condition met/unmet
- Alerts: fall detection, anomaly, security mode events
- System events: node online/offline, OTA updates, baseline changes, self-improving weight updates
- Learning milestones: "Prediction model for Alice reached 80% confidence", "Diurnal baseline for hour 14 fully calibrated"
Interactions:
- Tap any event: The 3D view jumps to that exact moment via time-travel. The scene shows the state at that point — blobs where they were, nodes in their state. The timeline becomes a spatial remote control
- Inline actions per event: Mark correct/incorrect (thumbs up/down), "Why?" (open explainability), create automation from this event, share
- Filters: By person, by zone, by event type, by time range. Combinable: "Alice + Kitchen + after midnight"
- Search: Natural language queries: "kitchen occupied after midnight last week" → filters to matching events
- Scroll up = go back in time. Open the dashboard after being away → scroll up to see everything that happened since last visit
Layout:
- Expert mode: timeline as a collapsible sidebar alongside the 3D view. Clicking events controls the 3D view
- Simple mode: timeline IS the main view (as the activity feed), with room cards above it
- Ambient mode: no timeline visible
Implementation: Events stored in SQLite with indexed timestamp, type, zone, person fields. WebSocket pushes new events in real-time. Frontend renders as a virtualized list (only DOM nodes for visible events). Search implemented as SQL queries on the events table.
28. Detection Explainability ("Why Is This Here?")
Every detection, alert, and automation trigger can be inspected to reveal exactly why the system made that decision. This is a first-class interaction, not a hidden debug tool.
Activation: Tap/click a humanoid figure in the 3D view → "Why?" button, or tap "Why?" on any timeline event.
X-ray overlay (3D view):
- All non-contributing visual elements dim (room bounds, other blobs, floor plan go to 20% opacity)
- Links that contributed to this detection glow, with brightness proportional to their deltaRMS contribution
- Fresnel zone ellipsoids appear for active links, showing WHY the system placed the blob at this intersection
- If BLE contributed: a dotted line from the matched BLE device's strongest node to the blob, labeled with RSSI
Detail sidebar:
- Per-link contribution table: link name, deltaRMS value, threshold, Fresnel zone number at blob position, learned weight
- BLE match details: device name, per-node RSSI values, match confidence
- Confidence breakdown: "Spatial confidence: 78% (from 5 contributing links). Identity confidence: 92% (iPhone RSSI −48 at kitchen-north)"
- For alerts: the specific conditions that triggered and their values vs thresholds
For false positives: The explainability view makes the cause obvious. "Link kitchen↔hallway spiked because of HVAC" → user marks as incorrect → system adjusts. Understanding replaces frustration.
For automations: "Trigger 'Couch Dwell' fired because Blob #2 (Alice) has been inside the trigger volume for 34 seconds (threshold: 30s). Action: webhook POST to http://ha.local/api/..."
29. Detection Feedback Loop (Thumbs Up/Down)
Every detection has two small buttons: correct (thumbs up) or incorrect (thumbs down). A third action — "I was here but you missed me" — allows the user to tap a location in the 3D view to mark a missed detection.
Available everywhere:
- On humanoid figures in the 3D view (small overlay buttons on hover/tap)
- On every detection event in the timeline
- On push notifications (inline action buttons)
What happens on feedback:
Thumbs up (correct):
- Contributing links' Fresnel weights get a small positive nudge (+0.001 per link)
- Reinforces the current detection parameters for these links
- Logged to the accuracy tracking table
Thumbs down (incorrect / false positive):
- Contributing links' Fresnel weights get a small negative nudge (−0.002 per link, slightly stronger than positive to prioritize eliminating false positives)
- Detection threshold for contributing links is microscopically raised (+0.5% per feedback)
- Event logged with timestamp and contributing links — if false positives cluster at specific times of day, the diurnal baseline for those links adjusts during the next hourly crossfade
- System responds in the timeline: "Got it. I've slightly raised the detection threshold for the contributing links. If this keeps happening at this time of day, my hourly baseline will adapt within a few days."
"I was here" (missed detection):
- User taps a location in the 3D view → a ground-truth point is recorded at that position and time
- Contributing links' detection thresholds are microscopically lowered (−0.5%)
- The self-improving weight system gets a positive sample: "a person was HERE, so links whose Fresnel zones cover this point should be more trusted"
Accuracy tracking:
- Dashboard shows an "Accuracy" trend card: "You've provided 47 corrections. Detection accuracy has improved 12% since installation."
- Weekly accuracy report in the morning briefing
- The trend graph shows the cumulative effect of user feedback over weeks — creating a visible reward for providing corrections
30. Spatial Context Notifications
Push notifications include a rendered mini floor-plan thumbnail and natural language text, so the user understands the spatial context without opening the app.
Notification format:
-
Image: A small (400×300 px) top-down floor plan rendering with the relevant blob/zone highlighted. Generated server-side as PNG by the Go backend. Works on every platform: iOS, Android, desktop, email, Slack, Discord.
Renderer specification:
- Library:
github.com/fogleman/gg(pure Go, no cgo, 2D drawing API). Embedded font: Roboto Regular at 10pt for labels, 8pt for names (embedded as//go:embedbinary in the Go binary). - Coordinate mapping: floor plan meters → image pixels. Scale =
min(380/room_width_m, 280/room_depth_m); origin at (10, 10) px for 10px margin. - Render pipeline (in order): (1) dark gray background fill (#1a1a1a); (2) if floor plan image exists: draw as background, resized to fit; (3) room zone outlines as white 1px rectangles; (4) highlighted zone: filled with 40% opacity red (alert) or zone color; (5) person circles: 8px radius for BLE-identified (person color), 6px radius for unknown (#888); (6) person name labels above each circle; (7) small scale bar (bottom-right corner, shows 5m reference); (8) optional text overlay (event title, bottom-left).
- Background layer is cached: the static floor plan (room outlines + uploaded image) is pre-rendered and cached as an in-memory PNG. Cache is invalidated when room bounds or floor plan image changes. Per-notification rendering only re-draws layers 4–8 on top of the cached background.
- Thread safety: each render gets its own
gg.Context— no shared mutable state. - Render time target: <50 ms for up to 10 people on a typical floor plan (measured on a Pi 4).
- Error handling: if any step fails (e.g., font load error, nil geometry), the notification is sent as text-only (no image) with a log warning.
- Test endpoint:
GET /api/notifications/preview?type=fall&person=Alicereturns a rendered test image for UI development and QA. - Output: in-memory
[]bytePNG (not written to disk); delivered directly in the HTTP notification payload or as a URL reference for push services that require URL-based images.
- Library:
-
Title: Short, natural language: "Motion in Kitchen (2:34am)" or "Fall Detected: Alice"
-
Body: One sentence of context: "Someone entered from the hallway. Security mode is active." or "Alice hasn't moved for 15 seconds."
-
Actions: Platform-native action buttons where supported: [Open Dashboard] [Dismiss] or for falls: [I'm Fine] [Call Help]
Notification types and their language:
| Event | Title | Body |
|---|---|---|
| Zone entry | "Alice entered Kitchen" | "Coming from the hallway. Bob is in the living room." |
| Security motion | "Motion detected (2:34am)" | "Kitchen, from hallway direction. Security mode active." |
| Fall alert | "Fall detected: Alice" | "Hallway. No movement for 15s. [I'm Fine] [Call Help]" |
| Anomaly | "Unusual activity" | "Motion in kitchen at 3am — normally vacant at this hour." |
| System | "Node offline: kitchen-north" | "Coverage in kitchen reduced. 5/6 nodes online." |
| Daily summary | "Daily summary" | "Home occupied 14h. Alice: 9h, Bob: 6h. All systems healthy." |
Smart batching:
- Multiple zone transitions within 30 seconds are batched: "Alice moved through hallway → kitchen → dining room" instead of three separate notifications
- Repeated events are collapsed: "Motion in kitchen (3 times in 10 min)" instead of three alerts
- Quiet hours suppress non-critical notifications (configurable per user)
Delivery channels (configurable per event type):
- Push notification (via Ntfy, Pushover, Gotify)
- Webhook (for Home Assistant, Slack, Discord, email)
- Dashboard only (default for low-priority events)
31. Ambient Dashboard Mode
A dedicated display mode for wall-mounted tablets or always-on screens. Served at /ambient — a separate lightweight route optimized for low-power devices.
Visual design:
- Simplified, stylized top-down floor plan — clean lines, soft rounded corners, no UI chrome
- People appear as softly glowing colored circles (BLE-identified) or neutral dots (unknown), with names in a gentle sans-serif font
- Room labels show subtle occupancy: "Kitchen · Alice" or "Bedroom · Empty"
- Smooth, calm animations: dots drift with interpolated positions, no jitter, no snapping
- No toolbar, no buttons, no panels — just the floor plan, the people, and a small status line
Time-of-day awareness:
- Morning (6–10am): bright, cool palette, cheerful
- Day (10am–6pm): neutral, clean
- Evening (6–10pm): warm amber tones, slightly dimmed
- Night (10pm–6am): very dim, minimal elements, just "All secure" centered. Screen brightness at 10%
Adaptive behavior:
- House empty for 30+ min: screen goes fully dark (OLED-safe), "All secure" in tiny text
- Someone arrives: gentle fade-in, dot appears with name
- Alert event: entire display transitions to alert mode — pulsing red border, large text, action buttons ("Dismiss" / "Call Help"). Returns to ambient after dismissal with a smooth crossfade
Morning briefing integration: When the first person is detected in the morning, the ambient display briefly shows the morning briefing text (sleep summary, overnight events, today's predictions) before fading to the normal ambient view.
Implementation: Separate /ambient route serving a lightweight HTML page. No Three.js — uses Canvas 2D or SVG for minimal resource usage on older tablets. WebSocket receives the same dashboard feed but only uses blob positions, zone counts, and alerts. Typically <30 MB RAM, <5% CPU on a 2018 iPad.
32. Spatial Quick Actions (Context Menus)
Right-click (desktop) or long-press (mobile) anywhere in the 3D view to get context-sensitive actions based on what's under the cursor.
On a person/blob:
- "Who is this?" → opens BLE device assignment if unidentified
- "Why is this here?" → opens explainability overlay (Component 28)
- "Follow" → camera smoothly tracks this person, auto-orbiting to keep them centered
- "Create automation here" → pre-fills a trigger volume at this location with this person's filter
- "Mark incorrect" → thumbs-down feedback (Component 29)
- "Track history" → filters timeline to this person's events
On a node:
- "Diagnostics" → inline CSI amplitude/phase plot for this node's links (2D overlay)
- "Blink LED" → sends identify command via WebSocket
- "Reposition" → enters TransformControls for this node
- "Update firmware" → triggers OTA if update available
- "Show links" → highlights all links involving this node
- "Disable" / "Enable" → takes node out of / returns to active fleet
On empty floor space:
- "What happened here?" → filters timeline to events within 1 m of this point
- "Add trigger zone" → creates a trigger volume centered here
- "Add virtual node" → places a virtual node for coverage planning
- "Coverage quality" → shows GDOP value at this point with contributing link breakdown
On a zone label:
- "Zone history" → occupancy chart (24h / 7d) for this zone
- "Edit zone" → resize/rename/delete
- "Create automation" → pre-fills zone-based trigger
- "Crowd flow" → shows flow data filtered to this zone
On a portal:
- "Crossing log" → recent directional crossings with timestamps and person names
- "Edit portal" → reposition or rename
- "Reverse direction" → swap the zone labels
On a trigger volume:
- "Edit trigger" → open automation config for this trigger
- "Test" → simulate a trigger fire to verify webhook/action
- "View log" → filter timeline to this trigger's events
- "Disable" / "Enable"
Implementation: Three.js Raycaster determines what's under the cursor. A single context menu component renders the appropriate options. Each action dispatches to existing dashboard functions (no new backend endpoints needed — just UI wiring).
33. Interactive Onboarding (Teach by Doing)
The onboarding wizard responds to live sensor data, teaching CSI physics through direct experience rather than documentation.
Sequence (runs after first node connects):
Step 1 — "Walk around" (30 s):
- Dashboard shows a real-time CSI amplitude chart alongside the 3D view
- "I'm listening to your WiFi router's signal through your new node. Walk across the room."
- As the user walks, the waveform visibly distorts. Amplitude spikes are highlighted in real-time
- "See that? Your body just changed the WiFi signal between your router and the node. That's how I detect you."
Step 2 — "Stand still" (10 s):
- "Now stand still for 10 seconds."
- The waveform stabilizes. A green "baseline" line fades in on the chart
- "This is your room's baseline — the signal when nothing is moving. Any change from this means someone is here."
- Baseline is automatically captured during this step (replaces the manual calibration trigger)
Step 3 — "Walk through the detection zone" (15 s):
- The Fresnel zone ellipsoid between the node and the router lights up in the 3D view as a translucent green volume
- "Walk between your node and the router — through the green zone."
- As user crosses it, the Fresnel zone pulses brighter and the amplitude chart shows a strong peak
- "That's the Fresnel zone — I'm most sensitive along this path. The more nodes you add, the more zones I have."
Step 4 — "Let me find you" (15 s):
- "Walk somewhere and stop. I'll try to locate you."
- A humanoid figure appears at the estimated position. A dotted circle shows the accuracy radius
- "Found you! I estimate you're about here. My accuracy is ±1 meter with this setup. Adding more nodes tightens this."
Step 5 — "Place your node" (interactive):
- Coverage painting activates on the ground plane
- "Now drag your node to where it actually is in the room. Watch the green coverage change — put it where it helps most."
- After placement: "Nice! Your coverage score is 62%. Want to add another node to improve it?"
Total duration: ~2 minutes. No jargon ("CSI", "Fresnel", "deltaRMS" never appear). User finishes with intuitive understanding of: how detection works, what a baseline is, where coverage is strong, and why more nodes help.
Skip option: "Skip tutorial" link visible throughout for users who know what they're doing.
34. Command Palette
Ctrl+K (Cmd+K on Mac) opens a universal search and command interface. Invisible to casual users, indispensable for power users.
Search:
- "kitchen" → Kitchen zone, kitchen nodes, kitchen automations, recent kitchen events
- "alice" → Alice's current location, today's timeline, sleep report, BLE devices
- "node 3" → Node details, diagnostics, link health
Navigate time:
- "last night 2am" → timeline jumps there, 3D view shows that moment
- "yesterday kitchen" → filters timeline to kitchen events yesterday
- "this morning" → jumps to first detection today
Execute commands:
- "update all nodes" → confirms and triggers fleet OTA
- "re-baseline kitchen" → triggers re-baseline for kitchen links
- "add node" → opens Web Serial onboarding
- "arm security" / "disarm security" → toggles security mode
- "dark mode" / "light mode" → toggles theme
- "export config" → downloads system configuration
- "restart node kitchen-north" → sends reboot command
Get help:
- "help fall detection" → opens contextual help about fall detection settings
- "why false positive" → opens explainability for the most recent incorrect detection
- "troubleshoot kitchen" → starts guided troubleshooting for the kitchen zone
- "how does prediction work" → inline help text
Behavior:
- Fuzzy matching: "flr pln" matches "Floor Plan settings." "brth" matches "Breathing band sensitivity"
- Recently used commands appear first
- Results show keyboard shortcut hints where applicable
- Escape closes, Enter executes top result
- Works in expert mode only (not in simple or ambient mode)
Implementation: Frontend-only component. Command registry maps keywords to actions. Search runs against: zone names, person names, node names, setting names, help topics. No backend endpoint needed — all dispatch is client-side.
35. Morning Briefing
When the user first opens the dashboard each day (or at a configured notification time), a brief, warm summary appears.
Content (generated from existing data):
Good morning, Alice. You slept 7h 39m — 12 minutes more than your average.
Breathing was regular.
Bob left at 8:15am. The house has been empty since 8:22am.
Last night: One unusual event at 2:34am — motion in the kitchen for
30 seconds. No BLE match, low-confidence blob. Likely environmental.
System health: Excellent (94%). All 6 nodes online.
Accuracy improved 2% this week thanks to your 8 corrections.
Today's forecast: Based on your Wednesday pattern, you usually return
around 5:45pm. Security mode will auto-activate when you leave.
Display:
- In expert mode: card overlay that appears on first dashboard open of the day, dismissible with a tap or "Got it" button. Slides away after 10 seconds if not interacted with
- In simple mode: the morning card is the first card in the layout, stays visible until dismissed
- In ambient mode: text fades in over the ambient display when first person detected in the morning, stays for 30 seconds
Adaptive length:
- Nothing interesting happened: "All quiet last night. All systems healthy." (one line)
- Something notable: leads with the notable event, then other details
- Something urgent: leads with the alert and actions needed
Delivery channels:
- Dashboard (default)
- Push notification at configured time (e.g., 7am)
- Webhook to Slack/Discord channel
"What happened while I was away" variant: When the user opens the dashboard after being away for >4 hours, a similar summary covers the entire absence period instead of just overnight.
Generation algorithm (Go function GenerateBriefing(date string, person string) string):
The briefing is assembled in priority order. Each section is a conditional block; sections with no data are omitted entirely.
Inputs (all queried for the prior night: 18:00 yesterday → now):
sleep = SELECT * FROM sleep_records WHERE person=? AND date=<yesterday>
events = SELECT * FROM events WHERE timestamp_ms BETWEEN night_start AND now ORDER BY timestamp_ms
anomalies = events WHERE type='anomaly' AND severity IN ('warning','alert','critical')
nodes = SELECT COUNT(*) FROM nodes WHERE status='online' / total
quality = current detection_quality
feedback_this_week = SELECT COUNT(*) FROM feedback WHERE timestamp_ms > now-7d
accuracy_delta = accuracy this week vs last week (from feedback table)
predictions = GET /api/predictions for person, horizon=60m
Priority assembly (render first non-empty block as the lead paragraph):
BLOCK 1 — Critical alerts (if any fall_alert or security_alert in events):
"⚠ [alert description, zone, time]."
BLOCK 2 — Sleep summary (if sleep record exists):
Base: "You slept [duration]h [duration_m]m"
+ deviation: " — [N] minutes [more|less] than your average." (if |delta| > 10 min)
+ restlessness: " Restlessness: [Low|Moderate|High]." (Low < 1/h, Moderate 1–3/h, High > 3/h)
+ breathing: " Breathing: [Regular|Irregular]." (regular if CV < 0.15)
+ anomaly: " Breathing rate elevated ([N] bpm vs [avg] bpm average)." (if bpm > avg×1.25)
BLOCK 3 — Who is home (current state):
"Bob left at [time]. The house has been empty since [time]." (if no one home)
OR "Alice is home. Bob left at [time]."
BLOCK 4 — Overnight anomalies (if any in events and not already in BLOCK 1):
"Last night: [first anomaly description]. [Low|Medium|High]-confidence."
(if multiple: "Last night: [N] unusual events. Most notable: [highest anomaly_score event]")
"Likely environmental." appended if anomaly_score < 0.7
BLOCK 5 — System health (if not excellent):
Skip if quality >= 90 and all nodes online.
"System health: [Excellent|Good|Fair|Poor] ([quality]%). [N]/[total] nodes online."
BLOCK 6 — Prediction hint (if prediction exists and confidence > 0.7):
"Today's forecast: Based on your [weekday] pattern, you usually [first predicted_enter action]."
BLOCK 7 — Learning progress (if feedback_this_week > 0):
"Accuracy improved [delta]% this week thanks to your [N] corrections." (if delta > 0)
OR: "You provided [N] corrections this week." (if delta = 0)
DEGENERATE CASE (all blocks empty = nothing happened):
"All quiet last night. All systems healthy."
"What happened while I was away" variant: identical algorithm but
night_start = SELECT MAX(last_seen_at) FROM sessions WHERE last_seen_at < now - 4h
(= the most recent session activity before the current gap; falls back to 4 h ago if no prior session)
night_end = now
BLOCK 2 (sleep) included only if period covers ≥ 4 h of nighttime hours
Storage: Briefing is generated once per day (at first open or at configured push time). The rendered text is stored in the briefings table. Subsequent dashboard opens the same day retrieve the stored record rather than re-generating.
Stored as a daily record in SQLite so it can be retrieved later.
36. Guided Troubleshooting
When the system detects that the user might be struggling or that detection quality has degraded, it proactively offers contextual help — but never when things are working well.
Trigger conditions and responses:
Detection quality drops:
- Condition: Zone-level detection quality below 60% for >24 hours
- Banner in timeline and 3D view: "Detection in the kitchen has been less reliable this week. Want me to help diagnose?"
- Guided flow: Check node connectivity → show link health with explainability → suggest node repositioning using coverage painting → offer re-baseline → "Still not right? Try adding a node here [highlighted optimal position]"
Repeated setting changes:
- Condition: The same settings key (from
/api/settingsPATCH requests) is modified 3 or more times within a 60-minute sliding window. Qualifying settings keys:delta_rms_threshold,breathing_sensitivity,tau_s,fresnel_decay,n_subcarriers. Keys that do not qualify: display preferences (theme, layout), notification config, MQTT config. - Tracking: the server increments a per-key edit counter in memory (not SQLite — ephemeral). Counter resets after 60 minutes of inactivity on that key.
- Trigger: when the counter for any qualifying key reaches 3 within the window, set a
hint_pendingflag. The flag is consumed and cleared when the next dashboard page load or next/api/settingsresponse includes"repeated_edit_hint": truein the JSON body. - Frontend behavior: on receipt of
repeated_edit_hint: true, show a non-intrusive banner (not modal): "You've adjusted the detection threshold several times. Would you like me to show you what the system is seeing?" with a [Show me] button and an [×] dismiss button. - [Show me] action: opens time-travel to the most recent detection event before the first edit in the window, with the explainability overlay pre-activated, so the user can visually tune thresholds against real data.
- Cooldown: after the hint is shown (displayed or dismissed), do not re-trigger the same hint for 24 hours regardless of further edits.
- The hint is stored in
localStorage(not server-side) — the server only sets the flag; the client remembers the 24-hour cooldown.
Node offline:
- Condition: Any node offline for >2 hours
- Timeline event with expandable troubleshooting steps: "Node kitchen-north has been offline since 3:15pm." → 1) "Is it powered? Check the USB connection." 2) "Can it reach WiFi? Look for the captive portal AP: spaxel-XXXX." 3) "Try reflashing from the dashboard: [Open Web Serial]." Each step has a one-click action where possible
First-time feature discovery:
- Condition: User opens a feature panel for the first time
- Brief, non-intrusive tooltip (not a modal): "Draw a box around an area, then choose what happens when someone enters or leaves. [Got it]"
- Shown once, never repeated. Dismissed on click anywhere
After false positive feedback:
- Condition: User marks a detection as incorrect
- Inline response in timeline: "Got it. I've slightly raised the detection threshold for the contributing links. If this keeps happening at this time of day, my hourly baseline will adapt within a few days. You can also adjust sensitivity manually → [Open Settings]."
After successful calibration:
- Positive reinforcement: "Re-baseline complete. Detection quality in the kitchen improved from 64% to 89%."
Design principles:
- Reactive, not proactive: Help appears only when something seems wrong or when the user is clearly exploring
- Dismissible in one tap: Never blocks the UI
- Never repeats after dismissal (stored in localStorage)
- Always explains what will happen next: "I'll adjust X, which should improve Y within Z days"
- Never condescending: Assumes the user is intelligent but may not know CSI physics
Home Automation Integration (MQTT)
The mothership acts as an MQTT publisher-only client connecting to the user's existing broker (e.g., Mosquitto bundled with Home Assistant). No MQTT broker runs inside the container. MQTT is optional — all features work without it; it's an integration layer only.
Configuration (via environment or settings API):
SPAXEL_MQTT_BROKER— broker URL, e.g.,mqtt://homeassistant.local:1883ormqtts://...SPAXEL_MQTT_USERNAME/SPAXEL_MQTT_PASSWORD— optional credentialsSPAXEL_MQTT_PREFIX— topic prefix (default:spaxel)SPAXEL_MQTT_CLIENT_ID— client ID (default:spaxel-<installation_id>)
Connection management:
- Connects on startup if configured. Reconnects with exponential backoff (1s, 2s, 4s... up to 5 min cap) on broker unavailability.
- LWT (Last Will and Testament):
{prefix}/availability→ payload"offline"(retained, QoS 1) - On successful connect: publish
{prefix}/availability→"online"(retained, QoS 1) - MQTT v3.1.1 for maximum HA compatibility (paho.mqtt.golang library)
Topic hierarchy:
{prefix}/ — e.g., "spaxel/"
availability — "online" | "offline" (LWT; retained)
system/detection_quality — integer 0-100 (published on change)
system/nodes_online — integer (published on change)
zone/{zone_name}/occupancy — integer count (published on change only)
zone/{zone_name}/people — JSON array of names e.g. ["Alice","Bob"] (published on change)
person/{person_name}/present — "home" | "not_home" (published on change)
person/{person_name}/zone — zone name string or "unknown" (published on change)
person/{person_name}/predicted_zone — JSON {"zone":"Kitchen","confidence":0.87} (published every 5 min)
alert/fall — JSON {"person":"Alice","zone":"Hallway","timestamp_ms":N} (event-fired)
alert/anomaly — JSON {"zone":"Kitchen","score":0.92,"message":"..."} (event-fired)
alert/security — JSON {"zone":"Hallway","timestamp_ms":N} (event-fired, security mode only)
node/{mac}/status — "online" | "stale" | "offline" (published on change)
node/{mac}/rssi — integer dBm (published every 30 s)
command/security_mode — subscribes to: "arm" | "disarm" (HA can control security mode)
command/rebaseline — subscribes to: zone name or "all" (HA can trigger re-baseline)
Home Assistant auto-discovery (published once on connect, retained, QoS 1):
HA auto-discovery topic pattern: homeassistant/{component}/spaxel_{entity_id}/config
// Zone occupancy sensor (one per zone)
// Topic: homeassistant/sensor/spaxel_zone_kitchen_occupancy/config
{
"name": "Kitchen Occupancy",
"unique_id": "spaxel_zone_kitchen_occupancy",
"state_topic": "spaxel/zone/Kitchen/occupancy",
"availability_topic": "spaxel/availability",
"device_class": "occupancy", // HA recognizes integer occupancy
"state_class": "measurement",
"unit_of_measurement": "people",
"device": {"identifiers":["spaxel"],"name":"Spaxel","manufacturer":"Spaxel","model":"1.0"}
}
// Per-person presence binary sensor (one per registered BLE person)
// Topic: homeassistant/binary_sensor/spaxel_person_alice_presence/config
{
"name": "Alice Present",
"unique_id": "spaxel_person_alice_presence",
"state_topic": "spaxel/person/Alice/present",
"payload_on": "home",
"payload_off": "not_home",
"availability_topic": "spaxel/availability",
"device_class": "presence",
"device": {"identifiers":["spaxel"],"name":"Spaxel"}
}
// Fall detection binary sensor
// Topic: homeassistant/binary_sensor/spaxel_alert_fall/config
{
"name": "Fall Detected",
"unique_id": "spaxel_alert_fall",
"state_topic": "spaxel/alert/fall",
"value_template": "{% if value_json.person is defined %}ON{% else %}OFF{% endif %}",
"availability_topic": "spaxel/availability",
"device_class": "safety",
"device": {"identifiers":["spaxel"],"name":"Spaxel"}
}
// System detection quality sensor
// Topic: homeassistant/sensor/spaxel_system_quality/config
{
"name": "Spaxel Detection Quality",
"unique_id": "spaxel_system_quality",
"state_topic": "spaxel/system/detection_quality",
"unit_of_measurement": "%",
"state_class": "measurement",
"availability_topic": "spaxel/availability",
"device": {"identifiers":["spaxel"],"name":"Spaxel"}
}
Auto-discovery lifecycle:
- Auto-discovery configs are published with
retain=trueon first connect and whenever zones/persons are added or renamed. - When a zone or person is deleted in the dashboard, the mothership publishes an empty retained payload to the corresponding auto-discovery topic to remove the entity from HA.
- Entity
unique_idis derived from the installation ID + entity type + name, ensuring stability across restarts.
Publish policy (avoiding floods):
- Zone occupancy and person presence: publish only on state change, not at 10 Hz.
- System metrics: publish every 30 s or on significant change (>5% quality change).
- Alerts: publish immediately on event fire, with no deduplication (each alert is a distinct event).
- MQTT publish queue is bounded at 500 messages; oldest are dropped if the broker is slow.
Bidirectional commands (subscriptions):
- Mothership subscribes to
{prefix}/command/security_modeand{prefix}/command/rebaselineafter connecting. - This allows HA automations to arm/disarm security mode or trigger re-baseline without opening the dashboard.
Data Flow Summary
ESP32 Node Mothership Browser
│ │ │
│── WS /ws/node ────────────▶│ │
│ binary: CSI frames │── Parse + buffer ──▶ Ring buf │
│ json: hello, health, │── Record ──▶ CSI replay store │
│ BLE scan results │── Phase sanitise ──▶ Clean CSI │
│ │── Feature extract ──▶ deltaRMS │
│◀── WS /ws/node ────────────│── Fresnel accumulate ──▶ Grid │
│ json: config, role, │── Peak extract ──▶ Blobs │
│ rate, OTA │── Biomech Kalman ──▶ Tracked │
│ │── BLE match ──▶ Identified │
│ │── Weight update ──▶ Self-improve │
│ │── Flow accumulate ──▶ Crowd map │
│ │── Trigger eval ──▶ Automations │
│ │── Predict ──▶ Pre-emptive acts │
│ │── Anomaly check ──▶ Alerts │
│ │── Sleep analysis ──▶ Reports │
│ │── Event log ──▶ Timeline store │
│ │── Notification render ──▶ PNG │
│ │ │
│ │── WS /ws/dashboard ────────────▶│
│ │ {blobs+identity, nodes, │
│ │ zones, links, triggers, │
│ │ confidence, predictions, │
│ │ sleep, flow} 10 Hz │
│ │ │
│ │◀── HTTP API ────────────────────│
│ │ (UI, param tuning, feedback, │
│ │ BLE registration, commands) │
│ │ │
│ │──▶ External MQTT broker ────────│
│ │ (optional HA auto-discovery) │
All traffic uses a single HTTP port (8080) — WebSocket upgrades for node connections and dashboard, REST for API. Entire stack sits behind Traefik with no additional ports.
REST API Specification
All endpoints are under the single HTTP server on port 8080. WebSocket endpoints use the Upgrade: websocket mechanism. All REST endpoints return Content-Type: application/json. Errors follow {"error": "<human message>", "code": "<snake_case_code>"}. Authentication: session cookie required on all /api/* endpoints (except /healthz and /api/provision).
WebSocket Endpoints
| Endpoint | Direction | Description |
|---|---|---|
GET /ws/node |
bidirectional | Node connection. Requires X-Spaxel-Token header. Binary frames upstream (CSI), JSON frames downstream (config/commands) |
GET /ws/dashboard |
server→client | Dashboard live feed at 10 Hz. Requires session cookie. JSON frames: {blobs, nodes, zones, links, triggers, confidence, predictions, sleep, flow, events} |
System
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/healthz |
— | {"status":"ok","uptime_s":N,"nodes_online":N,"db":"ok"} |
GET |
/api/status |
— | {"version":"1.0.0","nodes":N,"blobs":N,"uptime_s":N,"detection_quality":87} |
GET |
/api/settings |
— | All user-configurable settings as flat JSON object. If repeated_edit_hint is pending: includes "repeated_edit_hint":true (consumed on delivery — cleared after one response) |
PATCH |
/api/settings |
Partial settings object | Updated settings object |
GET |
/api/export |
— | Full config dump as JSON (nodes, zones, portals, trigger volumes, BLE registry, settings). See schema below. |
POST |
/api/import |
Config JSON (same schema as export) | {"ok":true,"imported":{nodes:N,zones:N,...}} or {"error":"...","code":"schema_mismatch"} |
Export/Import JSON schema:
{
"version": 1, // export format version (not app version)
"exported_at": "2024-03-15T...", // ISO8601
"nodes": [ // all rows from nodes table
{"mac":"AA:BB:CC:DD:EE:FF","name":"Kitchen North","pos_x":1.2,"pos_y":0.5,"pos_z":2.1,
"role":"rx","node_id":"f47ac10b-..."}
// firmware_version, status, last_seen omitted (runtime state, not config)
],
"zones": [{"name":"Kitchen","x":0,"y":0,"z":0,"w":4,"d":3,"h":2.5,"zone_type":"kitchen"}],
"portals": [{"name":"Kitchen Door","zone_a":"Kitchen","zone_b":"Hallway",
"points":[[1.2,3.0],[1.2,0.0]]}],
"triggers": [{"name":"Couch Dwell","shape":{"type":"box","x":1,"y":2,"z":0,"w":1,"d":1,"h":1.5},
"condition":"dwell","condition_params":{"duration_s":30},
"actions":[{"type":"webhook","url":"http://ha.local/api/..."}],"enabled":true}],
"ble_devices": [{"addr":"AA:BB:CC:DD:EE:FF","label":"Alice","type":"person","color":"#4488ff"}],
"floorplan": {"image_url":null,"cal_ax":0,"cal_ay":0,"cal_bx":200,"cal_by":0,
"cal_distance_m":5.0},
"settings": {"fusion_rate_hz":10,"grid_cell_m":0.2,"delta_rms_threshold":0.02,...}
}
Import behavior:
- All existing nodes, zones, portals, triggers, BLE devices, and settings are replaced by the import (full replace, not merge).
- The floorplan image itself is NOT exported/imported via this endpoint (only calibration metadata). Re-upload the image separately via
POST /api/floorplan/imageif needed. auth(install_secret, PIN) is excluded from export/import — these are installation-specific.- Learning data (baselines, anomaly patterns, prediction models, link weights) is excluded — these are derived data, not config.
- On validation failure: return
{"error":"schema mismatch","code":"schema_mismatch"}without modifying any data. |GET|/api/backup| — | ZIP archive (binary stream,Content-Type: application/zip). Archive contains:spaxel.db(SQLite Online Backup API snapshot),floorplan/directory,briefings.json(last 30 days). No auth bypass — requires valid session cookie. The SQLite backup uses the Online Backup API (sqlite3_backup_*) so no WAL-mode data is lost even under concurrent writes. Backup is streamed directly to the HTTP response without writing a temp file to disk. Filename hint:Content-Disposition: attachment; filename="spaxel-backup-<YYYY-MM-DD>.zip". Max response time: 5 s (warn in logs if exceeded). |
Authentication
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/api/auth/setup |
— | {"pin_configured":false} — used to detect first-run |
POST |
/api/auth/setup |
{"pin":"1234"} |
{"ok":true} — sets PIN on first run only |
POST |
/api/auth/login |
{"pin":"1234"} |
Sets spaxel_session cookie; {"ok":true} or HTTP 401 |
POST |
/api/auth/logout |
— | Clears cookie; {"ok":true} |
POST |
/api/auth/change-pin |
{"old_pin":"...","new_pin":"..."} |
{"ok":true} or HTTP 403 |
Provisioning
| Method | Path | Request | Response |
|---|---|---|---|
POST |
/api/provision |
{"mac":"AA:BB:CC:DD:EE:FF"} (optional hint) |
Binary NVS blob (WiFi creds + node token). No auth required — called by Web Serial onboarding |
Nodes
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/api/nodes |
— | [{mac, name, role, position, firmware_version, status, rssi, uptime_s, last_seen}] |
GET |
/api/nodes/:mac |
— | Single node object |
PATCH |
/api/nodes/:mac |
{name?, position?, role?} |
Updated node object |
DELETE |
/api/nodes/:mac |
— | {"ok":true} — removes node from registry (does not affect physical device) |
POST |
/api/nodes/:mac/reboot |
— | {"ok":true} — sends reboot command over WebSocket |
POST |
/api/nodes/:mac/identify |
— | {"ok":true} — blink LED for 5 s |
POST |
/api/nodes/:mac/update |
— | {"ok":true} — triggers OTA on single node |
POST |
/api/nodes/update-all |
— | {"ok":true,"count":N} — rolling OTA across fleet |
POST |
/api/nodes/:mac/rebaseline |
— | {"ok":true} |
POST |
/api/nodes/rebaseline-all |
— | {"ok":true,"count":N} |
POST |
/api/nodes/:mac/disable |
— | Sets role to IDLE |
POST |
/api/nodes/:mac/enable |
— | Restores prior role |
Firmware
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/api/firmware |
— | [{filename, version, sha256, size_bytes, uploaded_at}] |
POST |
/api/firmware |
Multipart form: file=<binary>, version=<string> |
{"ok":true,"sha256":"..."} |
DELETE |
/api/firmware/:filename |
— | {"ok":true} |
GET |
/firmware/:filename |
— | Raw binary (served to ESP32 during OTA; no auth required — URL contains SHA256 for integrity) |
Zones, Portals, Trigger Volumes
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/api/zones |
— | [{id, name, bounds:{x,y,z,w,d,h}, occupancy, people:[]}] |
POST |
/api/zones |
{name, bounds} |
Created zone object |
PATCH |
/api/zones/:id |
Partial zone | Updated zone |
DELETE |
/api/zones/:id |
— | {"ok":true} |
GET |
/api/zones/:id/history |
?period=24h |
[{timestamp, count, people:[]}] hourly buckets |
GET |
/api/portals |
— | [{id, name, zone_a, zone_b, plane:{points:[...]}}] |
POST |
/api/portals |
Portal geometry | Created portal |
PATCH |
/api/portals/:id |
Partial | Updated portal |
DELETE |
/api/portals/:id |
— | {"ok":true} |
GET |
/api/portals/:id/crossings |
?limit=50&before=<cursor> |
[{timestamp, direction, person, blob_id}] |
GET |
/api/triggers |
— | [{id, name, shape, condition, actions, enabled, last_fired}] |
POST |
/api/triggers |
Trigger object | Created trigger |
PATCH |
/api/triggers/:id |
Partial | Updated trigger |
DELETE |
/api/triggers/:id |
— | {"ok":true} |
POST |
/api/triggers/:id/test |
— | Fires trigger action once with synthetic event |
POST |
/api/triggers/:id/enable |
— | {"ok":true} |
POST |
/api/triggers/:id/disable |
— | {"ok":true} |
BLE Device Registry
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/api/ble/devices |
?registered=true or ?discovered=true |
[{addr, label, type, color, last_rssi, last_seen, auto_rotate}] |
POST |
/api/ble/devices |
{addr, label, type, color, icon?} |
Created device |
PATCH |
/api/ble/devices/:addr |
Partial | Updated device |
DELETE |
/api/ble/devices/:addr |
— | {"ok":true} |
Floor Plan
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/api/floorplan |
— | {image_url?, calibration:{point_a,point_b,real_distance_m}, room_bounds} |
POST |
/api/floorplan/image |
Multipart: file=<PNG/JPG> |
{"ok":true,"image_url":"/floorplan/image.png"} |
PATCH |
/api/floorplan/calibration |
{point_a:{x,y},point_b:{x,y},real_distance_m} |
Updated calibration |
GET |
/floorplan/image.png |
— | Raw image file |
Events & Timeline
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/api/events |
?limit=50&before=<cursor>&type=<type>&zone=<name>&person=<name>&after=<iso8601>&q=<text> |
{"events":[...],"cursor":"<next>","total":N} |
GET |
/api/events/:id |
— | Single event with full detail |
Security Mode
| Method | Path | Request | Response |
|---|---|---|---|
POST |
/api/security/arm |
— | {"ok":true,"security_mode":true} — enables security mode; any detection = alert |
POST |
/api/security/disarm |
— | {"ok":true,"security_mode":false} |
GET |
/api/security |
— | {"security_mode":bool,"armed_at":iso8601_or_null} |
Security mode state is stored in the settings table as key "security_mode" (boolean JSON). The armed_at timestamp is stored as "security_mode_armed_at" (ISO8601 string). Both are cleared on disarm.
When security mode is armed via the MQTT command/security_mode subscription, it calls the same internal arm/disarm function as the REST endpoints.
Localization & Predictions
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/api/blobs |
— | Current blob list (snapshot of live state) |
GET |
/api/predictions |
?person=<name>&horizon=30m |
[{zone, probability, horizon_min}] |
GET |
/api/occupancy |
— | {zones:{<name>:{count, people:[]}}} |
Sleep & Analytics
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/api/sleep |
?person=<name>&limit=30 |
[{date, duration_m, onset_latency_m, restlessness_index, breathing_rate_avg, breathing_regularity}] |
GET |
/api/sleep/summary |
?person=<name> |
Today's / last-night's summary |
GET |
/api/flow |
?period=24h&person=<name> |
{cells:[{x,y,count,vx,vy,dwell_s}]} |
GET |
/api/localization/weights |
— | [{link_id, weight}] |
POST |
/api/localization/weights/reset |
— | {"ok":true} |
Feedback
| Method | Path | Request | Response |
|---|---|---|---|
POST |
/api/feedback |
{type:"correct"|"incorrect"|"missed", blob_id?, position?:{x,y,z}, timestamp} |
{"ok":true} |
Calibration / Baseline
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/api/baseline |
— | [{link_id, snapshot_time, confidence}] |
POST |
/api/baseline/capture |
{links?:[link_id,...]} |
{"ok":true,"links_captured":N} — starts 60 s quiet-room capture |
CSI Replay
| Method | Path | Request | Response |
|---|---|---|---|
POST |
/api/replay/start |
{from_iso8601, to_iso8601} |
{"session_id":"..."} |
POST |
/api/replay/seek |
{session_id, timestamp_iso8601} |
{"ok":true} |
POST |
/api/replay/play |
{session_id, speed:1|2|5} |
{"ok":true} |
POST |
/api/replay/pause |
{session_id} |
{"ok":true} |
POST |
/api/replay/stop |
{session_id} |
{"ok":true} |
PATCH |
/api/replay/params |
{session_id, delta_rms_threshold?, tau_s?, fresnel_decay?, n_subcarriers?, breathing_sensitivity?} |
Re-runs pipeline; {"ok":true} |
POST |
/api/replay/apply-params |
{session_id} |
Copies tuned params to live pipeline |
Notifications & Integrations
| Method | Path | Request | Response |
|---|---|---|---|
GET |
/api/notifications/channels |
— | [{type, enabled, config}] |
PATCH |
/api/notifications/channels/:type |
Config object | Updated channel |
POST |
/api/notifications/test |
{channel_type} |
Sends test notification; {"ok":true} |
SQLite Schema
All tables reside in a single spaxel.db file. Schema version is tracked in the schema_migrations table. Migrations are applied in order on startup. All timestamps are Unix milliseconds (INTEGER). STRICT mode enforced where SQLite version ≥ 3.37. Foreign keys are enabled (PRAGMA foreign_keys = ON).
-- Schema version tracking
CREATE TABLE IF NOT EXISTS schema_migrations (
version INTEGER PRIMARY KEY,
applied_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
-- System settings (key-value with typed values)
CREATE TABLE IF NOT EXISTS settings (
key TEXT PRIMARY KEY,
value_json TEXT NOT NULL, -- JSON-encoded value (string, number, bool, array)
updated_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
-- Installation secrets and auth
CREATE TABLE IF NOT EXISTS auth (
id INTEGER PRIMARY KEY CHECK (id = 1), -- singleton row
install_secret BLOB NOT NULL, -- 32 bytes, random on first run
pin_bcrypt TEXT, -- bcrypt hash of dashboard PIN; NULL = not set
updated_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
-- Dashboard sessions
-- Sessions are server-side records bound to the `spaxel_session` HTTP cookie.
-- Cookie value = session_id (32-byte random hex, 64 chars). The server validates
-- by looking up session_id here; if not found or expired, HTTP 401 is returned.
CREATE TABLE IF NOT EXISTS sessions (
session_id TEXT PRIMARY KEY, -- 64-char hex (crypto/rand 32 bytes)
created_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
expires_at INTEGER NOT NULL, -- Unix ms; = created_at + 7 days (7×86400×1000)
last_seen_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000) -- updated on every authenticated request
);
CREATE INDEX IF NOT EXISTS idx_sessions_expires ON sessions(expires_at);
-- Expired sessions are purged in the background once per hour:
-- DELETE FROM sessions WHERE expires_at < unixepoch() * 1000
-- Session sliding window: last_seen_at is updated on every request.
-- If last_seen_at > expires_at - 1 day: extend expires_at by 7 more days (rolling session).
-- Cookie attributes: HttpOnly=true, SameSite=Strict (if TLS), Path=/, Max-Age=604800 (7 days)
-- Node registry
CREATE TABLE IF NOT EXISTS nodes (
mac TEXT PRIMARY KEY, -- "AA:BB:CC:DD:EE:FF"
node_id TEXT UNIQUE, -- UUID4 assigned at provisioning
name TEXT NOT NULL DEFAULT '',
pos_x REAL NOT NULL DEFAULT 0, -- meters in floor plan coordinates
pos_y REAL NOT NULL DEFAULT 0,
pos_z REAL NOT NULL DEFAULT 1,
role TEXT NOT NULL DEFAULT 'tx_rx' CHECK (role IN ('tx','rx','tx_rx','passive','idle')),
firmware_version TEXT,
chip TEXT,
flash_mb INTEGER,
capabilities TEXT, -- JSON array of strings
status TEXT NOT NULL DEFAULT 'offline' CHECK (status IN ('online','stale','offline')),
last_seen_ms INTEGER,
uptime_ms INTEGER,
wifi_rssi_dbm INTEGER,
free_heap_bytes INTEGER,
temperature_c REAL,
ip TEXT,
created_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
updated_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
-- Per-link Fresnel zone weights (self-improving localization)
CREATE TABLE IF NOT EXISTS link_weights (
link_id TEXT PRIMARY KEY, -- canonical form: min(MAC_a,MAC_b)+":"+max(MAC_a,MAC_b) for symmetric links; "AP_BSSID:NODE_MAC" for passive. Use CanonicalLinkID() to construct.
weight REAL NOT NULL DEFAULT 1.0,
sample_count INTEGER NOT NULL DEFAULT 0,
updated_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
-- Baseline snapshots (per-link, per calibration event)
CREATE TABLE IF NOT EXISTS baselines (
id INTEGER PRIMARY KEY AUTOINCREMENT,
link_id TEXT NOT NULL,
captured_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
n_sub INTEGER NOT NULL,
amplitude BLOB NOT NULL, -- REAL[n_sub] little-endian float32 array
phase BLOB NOT NULL, -- REAL[n_sub] little-endian float32 array
confidence REAL NOT NULL DEFAULT 0 -- 0.0–1.0; builds up as samples accumulate
);
CREATE INDEX IF NOT EXISTS idx_baselines_link ON baselines(link_id, captured_at DESC);
-- Diurnal baselines (24 hourly slots per link)
CREATE TABLE IF NOT EXISTS diurnal_baselines (
link_id TEXT NOT NULL,
hour_of_day INTEGER NOT NULL CHECK (hour_of_day BETWEEN 0 AND 23),
n_sub INTEGER NOT NULL,
amplitude BLOB NOT NULL, -- REAL[n_sub] float32
phase BLOB NOT NULL, -- REAL[n_sub] float32
sample_count INTEGER NOT NULL DEFAULT 0,
confidence REAL NOT NULL DEFAULT 0,
updated_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
PRIMARY KEY (link_id, hour_of_day)
);
-- BLE device registry
CREATE TABLE IF NOT EXISTS ble_devices (
addr TEXT PRIMARY KEY, -- "AA:BB:CC:DD:EE:FF"
label TEXT NOT NULL DEFAULT '',
type TEXT NOT NULL DEFAULT 'person' CHECK (type IN ('person','pet','object')),
color TEXT NOT NULL DEFAULT '#888888', -- CSS hex color
icon TEXT,
auto_rotate INTEGER NOT NULL DEFAULT 0, -- boolean: uses rotating addresses
first_seen INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
last_seen INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
last_rssi INTEGER,
created_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
-- Floor plan definition
CREATE TABLE IF NOT EXISTS floorplan (
id INTEGER PRIMARY KEY CHECK (id = 1), -- singleton row
image_path TEXT, -- relative to /data/ ; NULL = no image
cal_ax REAL, cal_ay REAL, -- calibration point A (image pixel coords)
cal_bx REAL, cal_by REAL, -- calibration point B
cal_distance_m REAL, -- real-world distance between A and B
room_bounds_json TEXT, -- JSON: [{name, x, y, z, w, d, h}]
updated_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
-- Zones
CREATE TABLE IF NOT EXISTS zones (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
x REAL, y REAL, z REAL, -- origin corner (meters)
w REAL, d REAL, h REAL, -- width, depth, height
zone_type TEXT NOT NULL DEFAULT 'general'
CHECK (zone_type IN ('general','bedroom','bathroom','living','exercise','kitchen','office','entry')),
last_known_occupancy INTEGER NOT NULL DEFAULT 0, -- persisted every 60 s and on shutdown; used for restart reconciliation
occupancy_updated_at INTEGER, -- Unix ms of last occupancy persistence; NULL = never persisted
created_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
updated_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
-- Portals (doorway crossing detectors)
CREATE TABLE IF NOT EXISTS portals (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
zone_a_id INTEGER REFERENCES zones(id) ON DELETE SET NULL,
zone_b_id INTEGER REFERENCES zones(id) ON DELETE SET NULL,
points_json TEXT NOT NULL, -- JSON: two floor points [[x1,y1],[x2,y2]]
created_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
-- Portal crossing log
CREATE TABLE IF NOT EXISTS portal_crossings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
portal_id INTEGER NOT NULL REFERENCES portals(id) ON DELETE CASCADE,
timestamp_ms INTEGER NOT NULL,
direction TEXT NOT NULL CHECK (direction IN ('a_to_b','b_to_a')),
blob_id INTEGER,
person TEXT -- resolved BLE label at time of crossing; NULL if unidentified
);
CREATE INDEX IF NOT EXISTS idx_crossings_portal ON portal_crossings(portal_id, timestamp_ms DESC);
CREATE INDEX IF NOT EXISTS idx_crossings_time ON portal_crossings(timestamp_ms DESC);
-- Trigger volumes (spatial automation)
CREATE TABLE IF NOT EXISTS triggers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
shape_json TEXT NOT NULL, -- JSON: {type:"box"|"cylinder", x,y,z, w,d,h | r}
condition TEXT NOT NULL CHECK (condition IN ('enter','leave','dwell','vacant','count')),
condition_params_json TEXT, -- JSON: {duration_s?, count_threshold?, person?}
time_constraint_json TEXT, -- JSON: {from:"22:00", to:"06:00"} or null
actions_json TEXT NOT NULL, -- JSON: [{type:"webhook"|"mqtt"|"internal", ...}]
enabled INTEGER NOT NULL DEFAULT 1,
last_fired INTEGER,
created_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
updated_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
-- Events (unified timeline)
CREATE TABLE IF NOT EXISTS events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp_ms INTEGER NOT NULL,
type TEXT NOT NULL, -- 'detection','zone_entry','zone_exit','portal_crossing',
-- 'trigger_fired','fall_alert','anomaly','security_alert',
-- 'node_online','node_offline','ota_update','baseline_changed',
-- 'system','learning_milestone'
zone TEXT,
person TEXT,
blob_id INTEGER,
detail_json TEXT, -- event-specific payload
severity TEXT NOT NULL DEFAULT 'info' CHECK (severity IN ('info','warning','alert','critical'))
);
CREATE INDEX IF NOT EXISTS idx_events_time ON events(timestamp_ms DESC);
CREATE INDEX IF NOT EXISTS idx_events_zone ON events(zone, timestamp_ms DESC);
CREATE INDEX IF NOT EXISTS idx_events_person ON events(person, timestamp_ms DESC);
CREATE INDEX IF NOT EXISTS idx_events_type ON events(type, timestamp_ms DESC);
-- Events archive (same schema as events; holds events older than 90 days)
-- Auto-archive runs nightly (02:00 local time) via a background goroutine:
-- INSERT INTO events_archive SELECT * FROM events WHERE timestamp_ms < (now_ms - 90d_ms)
-- DELETE FROM events WHERE timestamp_ms < (now_ms - 90d_ms)
-- Retention period configurable via settings key "events_archive_days" (default 90).
-- The /api/events endpoint queries ONLY the events table (not archive) for performance.
-- A separate endpoint GET /api/events/archive (same params) queries events_archive.
-- events_archive does NOT have an FTS5 index (archive search is slower but acceptable).
CREATE TABLE IF NOT EXISTS events_archive (
id INTEGER PRIMARY KEY, -- preserved from original events.id
timestamp_ms INTEGER NOT NULL,
type TEXT NOT NULL,
zone TEXT,
person TEXT,
blob_id INTEGER,
detail_json TEXT,
severity TEXT NOT NULL DEFAULT 'info'
);
CREATE INDEX IF NOT EXISTS idx_events_archive_time ON events_archive(timestamp_ms DESC);
-- FTS5 index for natural-language search across event detail
CREATE VIRTUAL TABLE IF NOT EXISTS events_fts USING fts5(
type, zone, person, detail_json,
content='events', content_rowid='id'
);
-- Triggers to keep events_fts in sync with the events table
-- (required for content FTS5 tables per SQLite documentation)
CREATE TRIGGER IF NOT EXISTS events_fts_insert AFTER INSERT ON events BEGIN
INSERT INTO events_fts(rowid, type, zone, person, detail_json)
VALUES (new.id, new.type, new.zone, new.person, new.detail_json);
END;
CREATE TRIGGER IF NOT EXISTS events_fts_delete AFTER DELETE ON events BEGIN
INSERT INTO events_fts(events_fts, rowid, type, zone, person, detail_json)
VALUES ('delete', old.id, old.type, old.zone, old.person, old.detail_json);
END;
CREATE TRIGGER IF NOT EXISTS events_fts_update AFTER UPDATE ON events BEGIN
INSERT INTO events_fts(events_fts, rowid, type, zone, person, detail_json)
VALUES ('delete', old.id, old.type, old.zone, old.person, old.detail_json);
INSERT INTO events_fts(rowid, type, zone, person, detail_json)
VALUES (new.id, new.type, new.zone, new.person, new.detail_json);
END;
-- On startup, if events_fts is empty but events has rows (e.g., after a schema re-creation),
-- rebuild with: INSERT INTO events_fts(events_fts) VALUES ('rebuild');
-- This is checked in Phase 3/7 of startup by comparing COUNT(*) on both tables.
-- Detection feedback
CREATE TABLE IF NOT EXISTS feedback (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp_ms INTEGER NOT NULL,
type TEXT NOT NULL CHECK (type IN ('correct','incorrect','missed')),
blob_id INTEGER,
position_json TEXT, -- JSON {x,y,z} for 'missed' type
links_json TEXT, -- JSON [link_id, ...] of contributing links at feedback time
event_id INTEGER REFERENCES events(id) ON DELETE SET NULL
);
CREATE INDEX IF NOT EXISTS idx_feedback_time ON feedback(timestamp_ms DESC);
-- Sleep records
CREATE TABLE IF NOT EXISTS sleep_records (
id INTEGER PRIMARY KEY AUTOINCREMENT,
person TEXT, -- NULL = zone-based (no BLE identity)
zone_id INTEGER REFERENCES zones(id) ON DELETE SET NULL,
date TEXT NOT NULL, -- "YYYY-MM-DD" (night start date)
bed_time_ms INTEGER,
wake_time_ms INTEGER,
duration_min INTEGER,
onset_latency_min INTEGER,
restlessness REAL, -- 0.0–5.0
breathing_rate_avg REAL, -- breaths/min
breathing_regularity REAL, -- coefficient of variation
summary_json TEXT -- 30-min bucket breakdown as JSON array
);
CREATE INDEX IF NOT EXISTS idx_sleep_person ON sleep_records(person, date DESC);
-- Presence prediction models (per-person, per-zone, per-time-slot, per-day-type)
CREATE TABLE IF NOT EXISTS prediction_models (
person TEXT NOT NULL,
zone_id INTEGER NOT NULL REFERENCES zones(id) ON DELETE CASCADE,
time_slot INTEGER NOT NULL, -- 0..95 (15-min buckets, 96 per day)
day_type TEXT NOT NULL CHECK (day_type IN ('weekday','weekend')),
probability REAL NOT NULL DEFAULT 0,
sample_count INTEGER NOT NULL DEFAULT 0,
updated_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
PRIMARY KEY (person, zone_id, time_slot, day_type)
);
-- Anomaly detection pattern model
CREATE TABLE IF NOT EXISTS anomaly_patterns (
zone_id INTEGER NOT NULL REFERENCES zones(id) ON DELETE CASCADE,
hour_of_day INTEGER NOT NULL CHECK (hour_of_day BETWEEN 0 AND 23),
day_of_week INTEGER NOT NULL CHECK (day_of_week BETWEEN 0 AND 6),
mean_count REAL NOT NULL DEFAULT 0,
variance REAL NOT NULL DEFAULT 0,
sample_count INTEGER NOT NULL DEFAULT 0,
updated_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
PRIMARY KEY (zone_id, hour_of_day, day_of_week)
);
-- Crowd flow accumulator (per time bucket per grid cell)
CREATE TABLE IF NOT EXISTS crowd_flow (
bucket_ms INTEGER NOT NULL, -- rounded to bucket boundary (1h or 1d)
bucket_type TEXT NOT NULL CHECK (bucket_type IN ('hour','day','week')),
cell_x INTEGER NOT NULL, -- grid cell x index
cell_y INTEGER NOT NULL, -- grid cell y index
entry_count INTEGER NOT NULL DEFAULT 0,
vx_sum REAL NOT NULL DEFAULT 0, -- sum of velocity x components for average
vy_sum REAL NOT NULL DEFAULT 0,
dwell_ms INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (bucket_ms, bucket_type, cell_x, cell_y)
);
-- OTA firmware metadata
CREATE TABLE IF NOT EXISTS firmware (
filename TEXT PRIMARY KEY,
version TEXT NOT NULL,
sha256 TEXT NOT NULL,
size_bytes INTEGER NOT NULL,
uploaded_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000),
is_latest INTEGER NOT NULL DEFAULT 0 -- boolean; only one row has 1
);
-- Morning briefing records (one per day)
CREATE TABLE IF NOT EXISTS briefings (
date TEXT PRIMARY KEY, -- "YYYY-MM-DD"
content TEXT NOT NULL, -- rendered text
generated_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
-- Notification channel config
CREATE TABLE IF NOT EXISTS notification_channels (
type TEXT PRIMARY KEY, -- 'ntfy','pushover','gotify','webhook','mqtt'
enabled INTEGER NOT NULL DEFAULT 0,
config_json TEXT NOT NULL DEFAULT '{}' -- channel-specific config (URLs, tokens); see below
);
/*
config_json schemas per channel type:
ntfy:
{"url":"https://ntfy.sh/my-topic", "token":"tk_..."} -- token optional (for private topics)
HTTP call: POST <url>
Headers: Authorization: Bearer <token> (if set), Title: <title>, Priority: urgent|high|default,
X-Attach: <image_url_or_base64> (for floor plan thumbnail)
Body: <text>
pushover:
{"app_token":"aXXXXXX...","user_key":"uXXXXXX..."}
HTTP call: POST https://api.pushover.net/1/messages.json
Body (form-encoded): token=<app_token>&user=<user_key>&title=<title>&message=<text>
&attachment=<base64_png> (for floor plan thumbnail, max 2.5 MB)
&priority=1 (high) or 2 (emergency — requires retry+expire)
gotify:
{"url":"https://gotify.example.com","token":"Aq7mXXXX"}
HTTP call: POST <url>/message?token=<token>
Body (JSON): {"title":"<title>","message":"<text>","priority":7}
Note: Gotify does not support image attachments natively; thumbnail is omitted.
webhook:
{"url":"https://example.com/hook","method":"POST","headers":{"X-Secret":"abc"}}
HTTP call: POST/GET <url> with optional headers
Body (JSON): same payload as trigger webhook (see Spatial Automation Builder)
Plus: "event_type":"fall_alert"|"anomaly"|"zone_entry"|...
mqtt:
(uses the global MQTT connection from SPAXEL_MQTT_BROKER; no separate config)
No config_json fields required; this channel is automatically enabled when MQTT is configured.
*/
-- CSI replay session state (ephemeral; cleared on restart)
CREATE TABLE IF NOT EXISTS replay_sessions (
session_id TEXT PRIMARY KEY,
from_ms INTEGER NOT NULL,
to_ms INTEGER NOT NULL,
current_ms INTEGER NOT NULL,
speed INTEGER NOT NULL DEFAULT 1,
state TEXT NOT NULL DEFAULT 'paused' CHECK (state IN ('playing','paused','stopped')),
params_json TEXT, -- tuned pipeline params; NULL = use live params
created_at INTEGER NOT NULL DEFAULT (unixepoch() * 1000)
);
CSI Replay Store (append-only file, not SQLite):
The CSI replay buffer is stored as an append-only binary file at /data/csi_replay.bin because SQLite's row-per-frame model would be too slow for the write rate (~30 Hz × 20 links = 600 frames/s).
File header (64 bytes):
magic: 8 bytes — 0x535041584C525000 ("SPAXLRP\0")
version: 4 bytes — uint32, currently 1
write_pos: 8 bytes — current write position (byte offset past last complete frame)
oldest_pos: 8 bytes — byte offset of oldest retained frame (for ring-buffer eviction)
reserved: 36 bytes — zeroed, reserved for future use
Per-frame record (variable length):
recv_time_ms: 8 bytes — mothership receive time (Unix ms, int64)
frame_len: 2 bytes — length of the CSI WebSocket binary frame that follows (uint16)
frame_data: N bytes — raw CSI binary frame (same format as the WebSocket binary frame)
On startup, write_pos is read from the header to resume appending. If the file header has a mismatched magic or truncated write, the file is truncated to the last complete frame (detected by scanning backward from write_pos). Eviction: when the file grows beyond the configured size limit (default 360 MB for 48 h retention), oldest_pos advances past the oldest frames in a block-eviction loop.
Resource Limits & Performance Budgets
Minimum Host Requirements
| Fleet Size | Nodes | Links | Min RAM | Min CPU | Disk (48h CSI + DB) |
|---|---|---|---|---|---|
| Minimal | 2 | 2 | 256 MB | 1 core (any) | 50 MB |
| Small | 4 | 6 | 512 MB | 1 core (Pi 4 class) | 150 MB |
| Medium | 8 | 28 | 512 MB | 2 cores | 420 MB |
| Large | 16 | 120 | 1 GB | 4 cores | 1.5 GB |
Tested minimum: Raspberry Pi 4 (4 GB RAM) running a 6-node fleet at 20 Hz with <10% CPU idle.
Docker Compose Resource Limits
Add to the compose service definition for production deployments:
deploy:
resources:
limits:
memory: 512m # Increase to 1g for 16+ node fleets
cpus: "2.0" # Adjust to leave headroom for the host OS
reservations:
memory: 128m
cpus: "0.5"
Memory breakdown (8-node fleet):
- Ring buffers: 28 links × 256 samples × 152 bytes/frame = ~1.1 MB
- SQLite page cache: ~20 MB (default)
- Go runtime + binary: ~30 MB
- Crowd flow accumulator: ~5 MB
- Dashboard WebSocket state (per client): ~1 MB
- Total: ~60 MB baseline, peaks to ~150 MB during full-rate operation
Pipeline Timing Budgets
The fusion loop runs at 10 Hz (100 ms budget per iteration). Per-stage targets:
| Stage | Target | Hard Limit |
|---|---|---|
| Phase sanitization (per link) | <1 ms | 3 ms |
| Feature extraction (per link) | <0.5 ms | 2 ms |
| Fresnel grid accumulation (all links) | <5 ms | 15 ms |
| Peak extraction | <2 ms | 5 ms |
| UKF update (per blob) | <1 ms | 3 ms |
| BLE matching | <1 ms | 3 ms |
| Trigger evaluation | <1 ms | 3 ms |
| Dashboard WebSocket publish | <2 ms | 5 ms |
| Total per fusion iteration | <15 ms | 40 ms |
If any stage exceeds its hard limit, a warning is logged. If the total iteration time exceeds 80 ms (cutting into the next iteration budget), the system enters load-shedding mode.
Load Shedding Policy
When the pipeline is running consistently >80 ms per 100 ms iteration (measured as a 5-iteration rolling average):
- Level 1 (>80ms): Suspend crowd flow accumulation (saves ~3 ms/iter). Log warning.
- Level 2 (>90ms): Suspend CSI replay buffer writes (saves ~2 ms/iter). Alert in dashboard.
- Level 3 (>95ms): Drop CSI frames that arrive when the processing channel is >50% full. Reduce all node rates to 10 Hz via config push. Alert in dashboard: "System under load — CSI rate reduced."
- Recovery: When load drops below 60ms for 10 consecutive iterations, restore normal operation in reverse order.
Load-shedding status is visible in the /api/health response and the dashboard status bar.
Bounded Resource Invariants
These bounds hold regardless of fleet size or runtime:
- Per-link ring buffer: capped at 256 samples (~152 bytes × 256 = 38 KB per link)
- Dashboard WebSocket send queue per client: capped at 50 frames (~500 KB); oldest dropped if client is slow
- Concurrent OTA updates: maximum 3 nodes simultaneously (controlled by rolling update logic)
- SQLite WAL file: checkpoint triggered when WAL exceeds 1000 pages (~4 MB); forced checkpoint on shutdown
- Events table: auto-archive (move to
events_archivetable) events older than 90 days; configurable via settings - CSI replay store: default 360 MB / 48 h; configurable via
SPAXEL_REPLAY_MAX_MBenv var - Maximum concurrent dashboard clients: 10 (configurable; beyond this, new connections are queued)
Disk Full Handling
When the /data filesystem has less than 100 MB free:
- Stop CSI replay buffer writes (highest volume I/O)
- Emit a system alert event and dashboard warning
- If disk drops below 20 MB free: also pause crowd flow accumulation writes and prediction model updates
- Detection and localization continue normally regardless of disk state
- Dashboard shows a "Disk space low" banner with current usage breakdown
Upgrade Path
Versioning Policy
Spaxel follows semantic versioning (MAJOR.MINOR.PATCH):
- PATCH: Bug fixes only. No schema changes. Safe to apply without any migration steps.
- MINOR: New features. Schema changes are additive only (new nullable columns, new tables). Migration runs automatically on startup. All existing data is preserved. Nodes running firmware from the same MAJOR version continue to work.
- MAJOR: May include breaking schema changes or protocol changes. A migration guide is published with each major release. Firmware must be updated to the same MAJOR version within 30 days (mothership logs a warning for nodes on a previous major version's firmware).
Firmware–Mothership Compatibility
| Mothership Version | Compatible Firmware Versions |
|---|---|
| 1.x | 1.x (any minor) |
| 2.x | 2.x, 1.x (read-only degraded mode) |
Compatibility check: on hello, the mothership compares firmware major version to its own. If incompatible (future major version firmware connecting to older mothership), the node is accepted but its role is set to IDLE and a warning is shown in the dashboard. If a previous major version's firmware connects, it continues to work with a deprecation warning.
Mothership Upgrade Procedure
# 1. Pre-upgrade backup (automatic on startup, but also do manually):
docker exec spaxel wget -qO- http://localhost:8080/api/backup > spaxel-backup-$(date +%Y%m%d).zip
# 2. Pull new image
docker compose pull
# 3. Restart (migrations run automatically on startup)
docker compose up -d
# 4. Verify upgrade
docker compose logs spaxel --tail=50
# Look for: "Schema migration applied: version X → Y" and "All systems ready"
Automatic pre-migration backup: On startup, if the schema version in SQLite differs from the compiled-in version, the mothership automatically creates a backup at /data/backups/pre-upgrade-v<old>-to-v<new>-<timestamp>.sqlite before running any migrations. This backup is a full SQLite copy (using the SQLite Online Backup API). Backups older than 90 days are automatically pruned.
Rollback Procedure
# Stop the new version
docker compose stop
# Restore the pre-upgrade database backup
cp /var/lib/docker/volumes/spaxel-data/_data/backups/pre-upgrade-*.sqlite \
/var/lib/docker/volumes/spaxel-data/_data/spaxel.db
# Restart with the previous image tag
# (edit docker-compose.yml to pin: image: ghcr.io/spaxel/spaxel:1.2.3)
docker compose up -d
Note: The CSI replay store (csi_replay.bin) format is append-only and forward-compatible across all versions. It does not need to be restored during rollback.
Schema Migration Framework
Each migration is a numbered Go function registered in a migrations slice:
// Migrations applied in order. Each migration is idempotent.
// The schema_migrations table tracks which have been applied.
var migrations = []Migration{
{Version: 1, Up: migration_001_initial_schema},
{Version: 2, Up: migration_002_add_diurnal_baselines},
// ...
}
Each Up function runs in a SQLite transaction. If it fails, the transaction is rolled back, the pre-migration backup is preserved, and the mothership exits with a clear error message. A failed migration never leaves the database in a partially-migrated state.
Acceptance Scenarios
These define done independently of the feature list. Each is independently verifiable. Pass criteria and fail criteria are listed explicitly under each scenario.
AS-1: First-time setup in under 5 minutes
User has a home WiFi network, a server running Docker, and one ESP32-S3.
Steps: User runs docker compose up -d, opens http://server:8080, sets a PIN, connects ESP32 via USB, clicks "Add Node." Dashboard flashes firmware, provisions WiFi credentials automatically.
Pass: ESP32 connects within 30 seconds, the 3D view shows a node icon, and passive radar CSI begins streaming (amplitude bars visible). No manual IP configuration entered.
Fail: User must enter a mothership IP address, or the process takes >5 minutes.
AS-2: Person detected while walking
User has 2+ nodes online and walks through the space.
Steps: User walks across the room at normal pace.
Pass: Within 3 seconds a blob appears in the 3D view, tracks the user's approximate path, and disappears within 5 seconds of the user standing still beyond the baseline threshold. Dashboard shows smooth_deltaRMS > 0.05 on at least one link.
Fail: No blob appears, or a blob persists >30 seconds after the user leaves.
AS-3: Fall alert fires correctly
User (or a test object) drops rapidly to floor height.
Steps: Drop a ~5 kg bag from standing height to the floor in a zone with 2+ nodes above 1.5 m.
Pass: Fall alert fires within 15 seconds: dashboard shows red pulsing blob, timeline shows fall_alert event, and the configured webhook receives a POST within 5 seconds of alert.
Fail: No alert fires within 60 seconds, or alert fires for bag-on-couch drop test (false positive).
AS-4: BLE identity resolves to person name
User has registered their phone (BLE address) as "Alice" in the device registry. Steps: Alice walks into detection range with her phone in her pocket. Pass: The blob in the 3D view shows "Alice" label within 10 seconds of the blob appearing. Dashboard shows "Alice entered Kitchen" in the timeline. Fail: Blob remains labeled "Unknown" despite Alice's phone being within 2 m of a node.
AS-5: OTA update succeeds without physical access
User has uploaded a new firmware binary to the dashboard. Steps: User clicks "Update All" in the Fleet Status panel. Pass: All nodes update in rolling fashion (30 s gap). Each node reconnects with the new firmware version. Dashboard shows green "VERIFIED" badge on all nodes within 10 minutes. Fail: Any node gets stuck in FAILED or ROLLBACK state, or >50% of nodes go offline simultaneously.
AS-6: Replay shows what happened at 2am
User wants to investigate an anomaly from last night. Steps: User opens Activity Timeline, taps an anomaly event at 2:34am, clicks "Why?" Pass: 3D view scrubs to that exact moment, shows blobs where they were at 2:34am, explains contributing links. User can scrub ±10 minutes and re-run pipeline with different thresholds. Fail: Replay is unavailable (CSI buffer expired), or the 3D view does not update when tapping the timeline event.
Installation & Onboarding Test Plan (Simulated ESP32 Devices)
Goal: validate the entire new-user journey — fresh install → first-run setup → device onboarding → operational system — with zero physical hardware, deterministically, on every release in CI. This complements AS-1 (real ESP32) with an automated hardware-free equivalent. The emulated device is the ESP32-S3 node; the fixture is the spaxel-sim CSI/node simulator (built from source via go build -o spaxel-sim ./cmd/sim — the binary is a build artifact and is never committed).
The simulator as a test fixture
spaxel-sim emulates one or more ESP32 nodes connecting to a running mothership and behaving like provisioned firmware:
| Flag | Purpose |
|---|---|
--mothership ws://host:8080/... |
mothership WebSocket endpoint the virtual nodes connect to |
--token <t> |
provisioning token presented at connect (auto-generated if empty) |
--nodes N |
number of virtual ESP32 nodes |
--walkers N |
synthetic people moving through the space (drives CSI + presence) |
--ble |
emit simulated BLE advertisements (device-identity onboarding) |
--rate Hz / --space WxDxH / --duration s / --seed |
CSI rate, room geometry, run length, reproducibility |
Emulates: node→mothership WebSocket registration with a token, CSI frame emission, BLE advertisements, walker-driven signal perturbation, multi-node TX scheduling. Does NOT emulate (drive these through the REST/BLE-provisioning API with a stub instead of the radio): the physical USB-flash step and the dashboard's BLE Wi-Fi-credential handshake.
Scenarios (IO-n, Pass/Fail explicit, headless in CI, deterministic via --seed, fresh ephemeral volume)
IO-1: Fresh install / first boot
Setup: mothership container started with an empty data volume.
Steps: GET /; complete first-run PIN setup (POST /api/auth/setup); poll /api/health.
Pass: first-run setup page served (200) while no PIN exists; after setup, migrations run (log "Schema migration applied … All systems ready"), PIN persists, /api/health green, first-run detection now reports pin_configured: true; the server reaches ready with no node attached.
Fail: setup page missing/loops, migrations don't run, or health never green within 30 s.
IO-2: Idempotent restart & upgrade-in-place Setup: a configured install (PIN, >=1 onboarded node, zones). Steps: stop + restart on the same volume; separately restart on a newer image tag. Pass: no re-setup prompt; PIN/nodes/zones intact; on the newer image the log shows "Schema migration applied: version X -> Y" exactly once, prior data readable, a pre-upgrade DB backup exists. Fail: re-setup demanded, data lost, migration runs twice, or no backup written.
IO-3: Single simulated node onboards end-to-end
Setup: fresh install past IO-1.
Steps: spaxel-sim --mothership ws://localhost:8080/... --token $TOKEN --nodes 1 --ble --seed 1; in the onboarding view accept the node and assign a label + 3D position.
Pass: node connects with the token, transitions discovered->online, appears in /api/nodes with online=true within 10 s, and label/position persist (REST + MQTT discovery config published).
Fail: node never online, valid token rejected, or label/position don't persist.
IO-4: Multi-node fleet bring-up
Steps: spaxel-sim --nodes 6 --walkers 0 --ble --seed 1 --duration 120.
Pass: all 6 reach online; mothership assigns non-overlapping TX slots (no collision warnings in logs); /api/nodes shows 6 online; the fleet/coverage view computes a GDOP/coverage estimate; telemetry flows for every node.
Fail: any node stuck offline, TX-slot collisions logged, or fleet view errors.
IO-5: Device-identity (BLE) onboarding
Steps: with --ble, register a simulated BLE address as a named person; run a walker carrying that identity.
Pass: the BLE advertisement is ingested, the registry resolves it to the name, and a person-entered-zone event + the corresponding MQTT person topic are produced (per the implementation's actual topic scheme).
Fail: BLE adv ignored or identity never resolves.
IO-6: Full new-user E2E (happy path) — HARD GATE
Steps: fresh install -> PIN -> onboard a 6-node fleet (IO-4) -> define 2 zones + 1 portal -> spaxel-sim --nodes 6 --walkers 1 --seed 1 --duration 90.
Pass: within the run the walker produces a tracked blob, zone-presence and portal-crossing events fire, the timeline records them, and MQTT/HA auto-discovery entities for nodes + zones + persons are published — end-to-end from empty volume to live events, no hardware, no manual IP entry.
Fail: any stage blocks, or no presence/zone events within the run.
IO-7..IO-11: Failure & edge onboarding (Pass = graceful, observable handling; Fail = crash/hang/silent drop)
- IO-7 Provisioning timeout: a node that connects then goes silent is marked stale/offline within the heartbeat window and surfaced in fleet status; no mothership crash.
- IO-8 Bad/expired token:
--token bogusis rejected with a clear error; node never enters the fleet; no zombie row. - IO-9 Duplicate MAC: two virtual nodes sharing a MAC -> second rejected or deterministically de-duplicated; no duplicate
nodesrows. - IO-10 Drop mid-onboard: killing
spaxel-simduring onboarding leaves the node re-onboardable; no half-provisioned lock. - IO-11 Firmware-version skew: a node reporting an old firmware version is flagged for OTA; onboarding completes and OTA can be initiated without losing the node (ties to AS-5).
Automation & resource budget
- All
IO-*run in CI via the acceptance harness (built from source; never a committed binary) against a container started from the release image, using--seedfor determinism and--durationcaps to bound runtime. - The simulator must stay within the per-process budgets in Resource Limits & Performance Budgets; the CI default (6 nodes + 1 walker, 90 s) must complete in < 2 min on a 4-core runner.
- Release gate: IO-1, IO-3, IO-4, IO-6 are hard-gate — a release is blocked if the hardware-free install + onboarding journey fails.
Anti-Patterns
Things NOT to do and why. These are design constraints, not suggestions.
| Anti-Pattern | Why It's Wrong | What to Do Instead |
|---|---|---|
| Adding a broker inside the container | Doubles operational complexity, adds ports, requires user config, provides no detection benefit. Users already have Mosquitto via Home Assistant. | Use SPAXEL_MQTT_BROKER to connect to the user's broker as a client. |
| Using HTTP polling from nodes to mothership | Polling creates state-detection lag, wastes bandwidth, and eliminates the authoritative "connected = online" invariant. | Nodes maintain a single persistent WebSocket. Connection state IS liveness. |
| Persisting UKF state to SQLite between runs | UKF state is only valid within a continuous tracking session. Stale state from a previous session poisons the next session's estimates. | UKF state is in-memory only. On restart, blobs are reconstructed from scratch. |
| Growing the events table unboundedly | At 10 Hz with motion, the events table would hit millions of rows quickly. | Archive events older than 90 days to events_archive automatically. |
Calling CanonicalLinkID in the hot path without caching |
String sort + concatenation per Fresnel grid cell × fusion tick = ~600 allocations/tick at full rate. | Canonical link IDs are computed once at link creation and stored. |
| Using Docker bridge networking for the mothership | mDNS uses multicast to 224.0.0.251 (link-local), which bridge networking blocks. Nodes cannot discover the mothership. |
Use network_mode: host. If host mode is forbidden, set SPAXEL_MDNS_ENABLED=false and provision nodes with manual IP. |
| Updating the learning models on every fusion tick | Prediction models, anomaly patterns, and baseline snapshots written every 100 ms would saturate SQLite WAL and burn through SD card writes on Pi. | Batch writes: baselines every 60 s, anomaly patterns every hour, prediction models every 5 minutes. |
| Triggering re-baseline on every node position update | Node position updates happen interactively during drag operations (~30 fps). A re-baseline is a 60-second process. | Re-baseline only on explicit user confirmation after position is finalized. |
| Sending full snapshot on every WebSocket frame | 10 Hz × full snapshot ≈ 100 KB/s per dashboard client. Kills mobile connections. | Send snapshot once on connect; subsequent frames are incremental diffs. |
Failure Modes & Resilience
Taxonomy of failure types with recovery strategy per type. Each failure mode has a specified test.
| Failure Type | Mode | Symptoms | Recovery Strategy | Test |
|---|---|---|---|---|
| Node network loss | Transient | Node WebSocket disconnects; OFFLINE in dashboard | Firmware exponential backoff reconnect (1→2→4→8→16→30s). Mothership marks OFFLINE immediately on disconnect; ONLINE on next hello. Self-healing fleet re-optimizes roles. |
Integration: disconnect node mid-sim, assert fleet continues producing blobs |
| Mothership unreachable (WiFi ok) | Transient | Firmware enters MOTHERSHIP_UNAVAILABLE; dashboard shows STALE | Node retries discovery every 30s indefinitely. CSI discarded locally. BLE results queued (max 60). Never triggers captive portal. | Firmware unit test: simulate 30s blackout, assert reconnect on mothership restore |
| WiFi credential failure | Persistent | 10 consecutive WiFi failures → captive portal | ESP32 starts spaxel-XXXX AP at 192.168.4.1 for re-provisioning. User enters new credentials. |
Firmware test: NVS with bad SSID → captive portal within timeout |
| SQLite corruption | Rare | PRAGMA integrity_check fails on startup |
Move corrupted DB aside to spaxel.db.corrupt.<timestamp>, start fresh. Baseline/learning data lost; nodes reconnect automatically. |
Unit test: corrupt DB file header, verify startup creates fresh DB |
| CSI replay file truncation | Rare | Last write incomplete (ungraceful shutdown) | On open: scan backward from write_pos to find last complete frame; truncate. Replay resumes from last clean frame. |
Unit test: write partial frame, open file, verify truncation to previous frame |
| Disk full | Gradual | /data free < 100 MB |
Halt CSI replay writes (largest writer). If <20 MB: also halt crowd flow + prediction updates. Detection continues. Dashboard shows warning. | Integration: fill volume to near-full, assert /healthz returns "disk":"degraded" |
| Pipeline overload | Gradual | Fusion iteration consistently >80ms | Load shedding: level 1 (suspend crowd flow), level 2 (suspend replay writes), level 3 (reduce all nodes to 10 Hz). Auto-recover when load drops below 60ms for 10 iterations. | Benchmark: force 20-node sim, measure iteration time; assert load shedding fires |
| OTA firmware corruption | Rare | SHA-256 mismatch after download | Node aborts OTA, sends ota_status: failed, error: sha256_mismatch. Does NOT reboot. Retains current firmware. |
Integration: serve corrupted binary, assert node does not reboot and stays on old version |
| Dashboard WebSocket overload | Transient | Client receives 10 Hz × full scene > budget | Send queue capped at 50 frames; oldest dropped if client is slow. Dashboard detects reconnect gap and requests fresh snapshot. | Integration: slow dashboard client, assert no server crash; assert snapshot-on-reconnect |
| BLE identity lapse on address rotation | Expected | Person label disappears for 60-90s | Identity retained for 5s after last match; rotation heuristics re-link within 90s. Blob continues to track; only label is lost temporarily. | Unit test: simulate rotation, assert re-linking within 3 scan cycles |
| mDNS blocked by router | Environment | Node cannot discover mothership | Fallback to ms_ip NVS key. If NVS is empty: captive portal shows IP entry field. SPAXEL_MDNS_ENABLED=false disables advertisement when not needed. |
Firmware test: disable mDNS response, assert fallback to ms_ip |
Go Module Layout
spaxel/
cmd/
mothership/ — main.go: startup sequencing, subsystem wiring
sim/ — main.go: CSI simulator CLI (spaxel-sim)
internal/
ingestion/ — WebSocket server, binary frame parsing, node lifecycle
pipeline/
phase/ — Phase sanitization (unwrap, OLS, residual)
nbvi/ — NBVI subcarrier selection (Welford online algorithm)
feature/ — deltaRMS, phase variance, breathing band IIR filter
baseline/ — EMA baseline, diurnal slots, snapshot persistence
localizer/
fresnel/ — Zone number cache, grid accumulation
ukf/ — Biomechanical UKF (gonum/mat)
gdop/ — Fisher information matrix, GDOP computation
fusion/ — Full localization loop (10 Hz)
fleet/ — Node registry, role assignment, stagger scheduler
ble/ — BLE centroid, rotation heuristics, identity matching
portal/ — Crossing detection state machine, zone occupancy
replay/ — csi_replay.bin reader/writer, replay pipeline
anomaly/ — Pattern model (Welford), anomaly scoring
predict/ — Presence prediction model, predicted_enter trigger
sleep/ — Sleep state machine, breathing FFT, daily records
flow/ — Crowd flow accumulator, dwell heatmap
notify/ — Notification renderer (fogleman/gg), delivery channels
mqtt/ — MQTT client, HA auto-discovery
auth/ — HMAC token derivation, bcrypt PIN, session management
oui/ — OUI lookup table (go:generate from IEEE list)
db/ — SQLite open/migrate, schema migrations
config/ — Environment variable parsing and defaults
dashboard/ — Static assets: HTML, JS (Three.js), CSS
firmware/ — ESP-IDF project (C source, CMakeLists, partitions.csv)
test/
integration/ — Simulator-based integration tests
Risk Register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| ESP32-S3 CSI API changes in future ESP-IDF versions | Medium | High | Pin to ESP-IDF 5.2.x in CI; test against 5.3.x in a canary branch before adopting |
modernc.org/sqlite performance ceiling at large fleets |
Low | Medium | Profiled at 8-node fleet (<5% of 100ms budget). Switch to mattn/go-sqlite3 if >16-node fleets need <1ms query times |
| BLE address rotation breaks identity tracking | High | Low | Rotation heuristics documented and implemented (Component 21). Identity lapses are 60s max. Recommend tracker tags for reliable identity |
| mDNS blocked on enterprise/managed home routers | Medium | Medium | ms_ip NVS fallback and captive portal IP entry provide recovery path without mDNS |
network_mode: host unavailable in some environments |
Medium | Medium | SPAXEL_MDNS_ENABLED=false disables mDNS; nodes use ms_ip NVS key. Documented in compose file. |
| CSI callback rate exceeds WebSocket send capacity | Low | High | FreeRTOS ws_send_queue depth 32 with silent drop. Load shedding at mothership reduces rates to 10 Hz under pressure. |
csi_replay.bin corruption on ungraceful host power loss |
Medium | Low | File header magic + truncation recovery on open. Live CSI continues without replay. Replay data loss ≤ 1 unflushed write. |
| Webhook endpoint unreachability cascades | Low | Low | 5s HTTP timeout, fire-and-forget, 4xx auto-disables trigger. Alert shown in dashboard. |
| Three.js SkinnedMesh performance on low-end mobile GPUs | Medium | Medium | LOD: disable Fresnel zones + shorten trails at >8 blobs. Fallback Simple Mode is CSS-only. |
Phase Plan
Phase 1 — Foundation
Goal: Bare-minimum loop from ESP32 to browser. Zero-config with passive radar and mDNS from day one.
- ESP32 firmware skeleton — WiFi connect, mDNS mothership discovery, CSI capture in promiscuous mode, single WebSocket connection to mothership (
/ws/node) carrying binary CSI frames upstream and JSON config downstream - Passive radar support — Firmware accepts
passive_bssidconfig to filter CSI from existing WiFi AP. Auto-detected during guided first run - BLE scanning — Passive BLE advertisement scanning on Core 0, concurrent with WiFi. Report device list as JSON on the WebSocket every 5 s
- Mothership WebSocket ingestion — Go service with
/ws/nodeendpoint that accepts bidirectional node connections, parses binary/JSON frames, mDNS service advertisement (_spaxel._tcp.local) - Dashboard skeleton — Static HTML/JS + Three.js served by mothership. 3D scene with ground grid, OrbitControls (pan/zoom/rotate),
/ws/dashboardWebSocket connection. Render raw amplitude bar chart as a 2D overlay for a single link - Docker packaging — Single Dockerfile,
docker-compose.ymlwith single port mapping (8080 HTTP/WS). Traefik labels included
Exit criteria: Flash firmware via Web Serial → plug in → node auto-discovers mothership → passive radar CSI streaming → amplitude bars visible in browser. Under 5 minutes, zero manual network config.
Entry criteria for Phase 2: All Phase 1 unit tests pass (go test ./...). Binary frame parse round-trip verified. Docker image builds cleanly for linux/amd64 and linux/arm64.
Phase 2 — Signal Processing & Detection
Goal: Detect presence on a single link.
- Phase sanitisation — Implement in Go: unwrap, linear regression, STO/CFO removal
- Baseline system — EMA baseline with motion-gated updates, SQLite persistence
- Motion detection — deltaRMS computation, threshold-based presence flag per link
- Dashboard presence indicator — Simple per-link "motion detected" / "clear" display with amplitude time series plot
- CSI recording buffer — Append incoming CSI frames to disk-backed circular buffer (48 h default). Foundation for time-travel replay
- Adaptive sensing rate — Mothership-controlled rate changes (idle 2 Hz ↔ active 50 Hz) per link. On-device amplitude variance check for local burst-to-active. Motion hints from ESP32 to preemptively ramp adjacent links
Exit criteria: Dashboard reliably shows motion detected / clear for a single link with one person walking through. Idle links automatically drop to 2 Hz.
Entry criteria for Phase 3: Phase sanitization, deltaRMS, and baseline unit tests all pass. CSI simulator produces frames that the mothership correctly parses without malformed-frame warnings.
Phase 3 — Multi-Node & Localization
Goal: Spatial positioning with 4+ nodes. Humanoid blob rendering from the start.
- Bidirectional node protocol — Registration (
hello), health reporting, BLE scan relay, role/config/rate push, OTA commands — all over the existing WebSocket connection - Fleet manager — Node registry in SQLite, role assignment engine (including passive radar virtual node), stagger scheduling, self-healing role reassignment on node loss
- Multi-link fusion — Fresnel zone weighted localization on a 3D grid
- Biomechanical blob tracking — Peak extraction, ID assignment, UKF with human motion constraints (max velocity, acceleration, turning radius, gravity-consistent Z, collision avoidance, persistence through brief association gaps)
- 3D spatial visualization — Room bounds, floor plan texture, humanoid figures (standing/walking/seated/lying postures via
SkinnedMesh+AnimationMixer), vertical pillar anchors, footprint trails, node meshes, link lines, view presets - Node placement UI — TransformControls for dragging nodes in 3D, space dimension editor
- Live coverage painting — GDOP overlay on ground plane, updates in real-time during node drag. Virtual node support for planning
Exit criteria: 4+ nodes produce a 3D view with humanoid figures tracking a walking person at ±1 m accuracy. Figures animate between postures. User can orbit, pan, and zoom. Coverage overlay shows detection quality.
Entry criteria for Phase 4: Fresnel zone, UKF, and GDOP unit tests all pass. spaxel-sim --nodes 4 --walkers 1 --duration 30s produces blob count > 0 for >80% of the run.
Phase 4 — Onboarding & OTA
Goal: Non-technical users can add and update nodes. Interactive guided wizard that teaches by doing.
- Interactive onboarding wizard (Component 33) — Flash firmware via Web Serial → node auto-discovers mothership via mDNS → wizard responds to live sensor data: "Walk around" (see CSI waveform react), "Stand still" (capture baseline), "Walk through the detection zone" (see Fresnel zone light up), "Let me find you" (blob appears), "Place your node" (coverage painting guides optimal position). 2-minute hands-on tutorial, no jargon
- Provisioning payload — Mothership generates config blob (WiFi creds + node ID, no IP needed), firmware writes to NVS
- OTA system — HTTP firmware serving, WebSocket-triggered updates, rolling update logic with 30 s gaps, automatic rollback
- Captive portal recovery — AP fallback mode on WiFi failure, config page for re-provisioning (WiFi creds + optional manual mothership IP)
- Guided troubleshooting foundation (Component 36) — First-time feature discovery tooltips. Node-offline troubleshooting steps in timeline. Post-calibration positive reinforcement messages
Exit criteria: A new ESP32-S3 can go from unboxed to streaming CSI in under 5 minutes with the user understanding HOW detection works. Firmware can be updated OTA without physical access.
Entry criteria for Phase 5: Web Serial provisioning round-trip integration test passes. OTA rollback integration test passes (push invalid firmware → node reverts). Phase 1–4 unit tests green.
Phase 5 — Reliability & Intelligence
Goal: Production-quality detection for daily home use.
- Diurnal adaptive baseline — 24-slot hourly baseline vectors, 7-day learning period, automatic crossfade. Baseline confidence indicator per link in dashboard
- Stationary person detection — Breathing band extraction (0.1–0.5 Hz), long-dwell logic
- Ambient confidence score — Per-link health metrics (SNR, phase stability, packet rate, drift), composite system-wide "Detection Quality" gauge. Link thickness/color in 3D view reflects health
- Self-healing fleet — Automatic role re-optimization on node loss/recovery, before/after coverage comparison, graceful degradation warnings
- Link weather diagnostics — Root-cause suggestions for degraded links, weekly reliability trends, node repositioning advice with highlighted positions in 3D
Exit criteria: System runs unattended for 7+ days with <5% false positive rate, surviving node reboots, WiFi blips, and diurnal environmental changes.
Phase 6 — Identity & Spatial Automation
Goal: Named presence, actionable automations, and safety features. Natural language notifications from day one.
- BLE device registry — "People & Devices" dashboard panel. Discovered BLE devices listed with auto-detected type (iPhone, Watch, Tile, etc.). User assigns labels ("Alice", "Dog Tracker", "Car Keys"), type (person/pet/object), and color. Multiple devices can map to one person
- BLE-to-blob identity matching — Multi-node RSSI triangulation matched to nearest CSI blob. Humanoid figures gain per-person color and name label. Dashboard shows "Alice is in Kitchen" instead of "Blob #2"
- Room transition portals — Doorway planes in 3D editor, directional crossing detection, per-zone occupancy counters with person names. Zone labels in 3D view: "Kitchen: Alice, Bob"
- Spatial automation builder — 3D trigger volumes with conditions (enter/leave/dwell/vacant/count + optional person filter: "when Alice enters..."). Webhook and MQTT actions. Visual feedback when triggers fire
- Fall detection — Z-axis rapid descent + sustained stillness. Alert chain: dashboard alarm → webhook → push notification → escalation. Person-identified alerts when BLE available: "Fall detected: Alice in Hallway"
- Spatial context notifications (Component 30) — Push notifications with rendered mini floor-plan thumbnails (PNG, server-side 2D renderer) and natural language text. Smart batching (collapse rapid-fire events). Quiet hours. Configurable delivery channels (Ntfy/Pushover/webhook)
- Home automation integration — Optional MQTT client for HA auto-discovery (per-person presence sensors, zone occupancy, fall alerts). Webhook support for non-MQTT setups
Exit criteria: BLE-identified blobs show correct person names. Notifications include floor-plan thumbnails with person names. Room transition counts match manual observation within ±1. Fall detection fires on simulated falls with <10% false positive rate.
Phase 7 — Learning & Analytics
Goal: The system gets smarter over time. User feedback drives improvement.
- Detection feedback loop (Component 29) — Thumbs up/down on every detection (3D view, timeline, notifications). "I was here" missed-detection marking. Feedback adjusts Fresnel weights and detection thresholds. Accuracy trend tracking: "You've provided 47 corrections. Accuracy improved 12%"
- Self-improving localization — BLE proximity as continuous ground truth drives per-link Fresnel weight refinement. Accuracy trend graph in dashboard. Weights persist in SQLite
- Presence prediction — Per-person, per-zone, per-time-slot transition probabilities learned over 7+ days. Dashboard predictions widget. REST API. New
predicted_enterautomation trigger type. HA prediction sensors - Sleep quality monitoring — Breathing analysis + motion scoring in bedroom zones. Morning summary card, weekly trends, anomaly flagging. Per-person when BLE available
- Crowd flow visualization — Trajectory accumulation into directional flow map. Animated arrows for corridors, dwell hotspot pools. Time and person filters. Toggle-able 3D layer
- Anomaly detection & security mode — 7-day pattern learning, anomaly scoring, security mode with full alert chain, "Away" auto-activation
Exit criteria: Accuracy trend graph shows measurable improvement over 4 weeks. User feedback visibly improves detection within 48 hours. Presence predictions achieve >75% accuracy at 15-minute horizon.
Phase 8 — Analysis & Developer Tools
Goal: Deep debugging, system tuning, and detection explainability.
- Activity timeline (Component 27) — Universal event stream: detections, transitions, alerts, system events, learning milestones. Tap any event → 3D view jumps to that moment. Inline feedback buttons. Search and filter. Timeline sidebar in expert mode, activity feed in simple mode
- Detection explainability (Component 28) — "Why is this here?" on any blob/alert: X-ray overlay dims non-contributing elements, glows contributing links with Fresnel zone intersection, shows BLE match details and confidence breakdown
- Time-travel debugging — Pause live view, scrub timeline, replay 3D scene from recorded CSI. Parameter tuning overlay with live re-processing. "Apply to Live" button. Integrated with activity timeline for navigation
- Pre-deployment simulator — Virtual space + virtual nodes + synthetic walkers. GDOP overlay, accuracy estimates, minimum node recommendation, shopping list output
- CSI simulator — Go CLI tool (
cmd/sim/main.go) that opens WebSocket connections as virtual nodes and sends synthetic CSI binary frames (with optional simulated BLE) for development/testing without hardware.
Command-line interface:
spaxel-sim \
--mothership ws://localhost:8080/ws/node \
--token <node_token> \ # HMAC from install_secret + mac
--nodes 4 \ # number of virtual nodes to simulate
--walkers 1 \ # number of walking persons to simulate
--rate 20 \ # CSI Hz per node
--duration 60s \ # run for N seconds (0 = forever)
--ble \ # also send simulated BLE advertisements
--seed 42 \ # random seed for reproducible runs
--space "6x5x2.5" \ # room dimensions in meters (WxDxH)
Synthetic CSI frame generation:
- Each virtual node has a fixed position in the simulated space (placed at corners, evenly distributed)
- Each walker follows a random walk: Gaussian velocity updates (σ = 0.3 m/s per axis per 50 ms step), reflected at room walls
- For each TX→RX link pair at each tick, compute
amplitudeandphaseusing the same propagation model as the pre-deployment simulator (Component 17: path-loss + wall penetration + first-order reflection) - Inject Gaussian noise:
amplitude_noisy[k] = amplitude × (1 + N(0, 0.05)),phase_noisy[k] = phase + N(0, 0.1) - Serialize into the 24-byte binary frame format with
n_sub = 64, populating all fields.rssi = clamp(-30 - path_loss_dB, -90, -30).noise_floor = -95 timestamp_usincrements at the configured rate starting from 1000 (simulates ~1 ms boot time)
Simulated BLE: When --ble is set, one virtual node per 5 s sends a {type:"ble", devices:[{addr:"AA:BB:CC:DD:EE:FF", rssi: -60 + N(0,5), name:"SimPerson"}]} JSON frame. The BLE address matches the walker's simulated phone. No address rotation in simulation mode.
Verification: The simulator exits non-zero if it receives a {type:"reject"} downstream message (authentication or rate limiting). It prints per-second frame counts and the mothership's blob count (from a parallel GET /api/blobs poll) to stdout for integration test assertions.
Integration test usage:
# Start mothership
docker run -d -p 8080:8080 --name spaxel-test ghcr.io/spaxel/spaxel:latest
# Run simulator for 30 s
spaxel-sim --mothership ws://localhost:8080/ws/node --nodes 4 --walkers 1 --duration 30s
# Assert blob count > 0
curl -s http://localhost:8080/api/blobs | jq '.| length > 0'
- Fresnel zone debug overlay — Toggle wireframe ellipsoids between active links in the 3D scene
Exit criteria: Tapping "Why?" on any detection shows a clear visual explanation of contributing links. Time-travel replay successfully replays 24 h of data. Simulator produces realistic synthetic data.
Phase 9 — UX Polish & Accessibility
Goal: Accessible to every household member. Power user efficiency. Always-on ambient display.
- Simple mode (progressive disclosure) — Card-based mobile-first UI with room occupancy cards, activity feed (from timeline), alert banner, sleep summary, morning briefing card. No 3D scene. Toggle between simple/expert mode. Optional PIN for expert mode
- Ambient dashboard mode (Component 31) —
/ambientroute for wall-mounted tablets. Simplified top-down floor plan with colored dots and names. Time-of-day palette. Auto-dim when empty. Alert mode breaks the calm. Morning briefing on first detection. Lightweight Canvas 2D renderer - Spatial quick actions (Component 32) — Right-click / long-press context menus on every 3D element. Actions on blobs, nodes, empty space, zones, portals, trigger volumes. "Follow" camera mode on people
- Command palette (Component 34) — Ctrl+K / Cmd+K universal search and command interface. Search zones/people/nodes/events. Navigate time. Execute commands. Get help. Fuzzy matching. Expert mode only
- Morning briefing (Component 35) — Daily summary card on first dashboard open: sleep report, overnight events, system health, today's predictions. Also deliverable as push notification or webhook
- Guided troubleshooting (Component 36) — Proactive contextual help when detection quality drops, settings are repeatedly changed, or nodes go offline. Post-feedback explanations. First-time feature tooltips. Never blocks, never repeats, never condescends
- Mobile-responsive expert mode — Touch orbit/pan/zoom, hamburger menu for panels
- Fleet status page — Full table view with all node metrics, bulk actions, camera fly-to on click
Exit criteria: Non-technical household member can use simple mode to check occupancy without training. Ambient mode runs unattended on a wall-mounted tablet for 7+ days. Command palette reaches any feature in ≤3 keystrokes. Morning briefing accurately summarizes overnight activity.
Startup Sequencing & Graceful Shutdown
Startup Phases
The mothership starts in strict sequential phases. Each phase logs its completion at INFO level. If any phase fails, the process exits non-zero with a clear error message.
Phase 1/7 — Data directory: verify /data is writable; acquire flock on /data/.lock to prevent duplicate instances
Phase 2/7 — SQLite: open database with PRAGMA journal_mode=WAL; PRAGMA synchronous=NORMAL; PRAGMA foreign_keys=ON
On corrupt DB detected: move aside to /data/spaxel.db.corrupt.<timestamp>, start fresh, log warning
Run PRAGMA integrity_check on every start; on failure, move aside and start fresh
Phase 3/7 — Schema migration: apply pending migrations in order; rollback on failure
Phase 4/7 — Config & secrets: load/generate SPAXEL_INSTALL_SECRET; validate all env vars against schema
Phase 5/7 — Subsystems: start ingestion server, signal pipeline, fleet manager, fusion engine — in that order
Each subsystem reports ready or fatal within 5 s
Phase 6/7 — HTTP server: bind to :8080; register all routes. mDNS advertisement starts only after bind succeeds.
mDNS library: github.com/hashicorp/mdns (pure Go, no cgo, no OS mDNS daemon dependency).
Service registration:
mdns.NewMDNSService(
instance = SPAXEL_MDNS_NAME, // default "spaxel"
service = "_spaxel._tcp",
domain = "local.",
hostName = "", // use system hostname
port = 8080,
ips = nil, // all non-loopback interfaces
txt = ["version=1","ws=/ws/node","api=/api"],
)
The TXT records allow future nodes to auto-discover the WS path and API prefix.
mdns.NewServer(config) starts the responder goroutine. On shutdown: server.Shutdown().
Phase 7/7 — Health: POST /healthz returns 200 JSON {"status":"ok","nodes":N}. Announce readiness to stdout
Startup timeout: if phases 1–6 don't complete within 30 s, the process exits with a clear error. This prevents a zombie container that's bound but not functional.
Graceful Shutdown
SIGTERM triggers an ordered shutdown with a 30-second hard deadline:
1. Stop accepting new node WebSocket connections (return HTTP 503 on upgrade attempts)
2. Send {type: "shutdown", reconnect_in: 30} to all connected dashboard WebSocket clients
3. Stop the fusion loop — no new blobs published
4. Drain the signal processing pipeline — process all frames already in the channel buffer
5. Flush all in-memory baselines to SQLite (atomic transaction)
6. Flush the CSI recording write buffer to disk
7. Close all node WebSocket connections (nodes will auto-reconnect after restart)
8. Write a "system_stopped" event to the SQLite events table
9. Run PRAGMA wal_checkpoint(FULL) to collapse WAL into main DB file
10. Close SQLite; release flock; exit 0
If any step exceeds the 30-second total deadline, the process force-exits (exit 1). Docker's stop_grace_period: 35s in compose gives the full 30 s.
SQLite Durability
- WAL mode: crash-safe writes; readers don't block writers
- Per-baseline-snapshot writes use SQLite transactions (BEGIN → INSERT/REPLACE → COMMIT)
- Baseline snapshots are persisted every 60 s in addition to on shutdown (prevents losing up to 60 s of learning on crash)
- CSI recording buffer: append-only file with a write cursor. On restart, the cursor is recovered from the file header. An incomplete final write is truncated on open
- Atomic file writes (temp + rename) used for any non-SQLite persistent files (floor plan images, firmware metadata)
Health & Observability
GET /healthz— returns{"status":"ok","uptime_s":N,"nodes_online":N,"db":"ok"}or{"status":"degraded","reason":"..."}. HTTP 200 on healthy, 503 on degraded. Used by DockerHEALTHCHECKand optional Traefik health routing- All subsystems use Go's
errgroupfor goroutine lifecycle. Panics in subsystem goroutines are recovered, logged, and the subsystem is marked DEGRADED in the health response - Process logs include version string, data directory, and listening port on startup for support diagnostics
Deployment
Environment Variables
All environment variables are optional unless marked (required on production). Unset = use default.
| Variable | Default | Description |
|---|---|---|
SPAXEL_BIND_ADDR |
0.0.0.0:8080 |
Listen address. Set to 127.0.0.1:8080 to restrict to localhost (e.g., when behind a local reverse proxy) |
SPAXEL_INSTALL_SECRET |
(auto-generated) | 64-char hex installation secret. Auto-generated on first run and stored in SQLite. Override for scripted deployments |
SPAXEL_DATA_DIR |
/data |
Path to the persistent data directory (SQLite, floor plans, CSI replay buffer, firmware uploads) |
SPAXEL_FIRMWARE_DIR |
/firmware |
Path to the firmware binaries directory for OTA |
SPAXEL_MQTT_BROKER |
(disabled) | MQTT broker URL: mqtt://host:1883 or mqtts://host:8883. If unset, MQTT integration is disabled |
SPAXEL_MQTT_USERNAME |
(none) | MQTT broker username |
SPAXEL_MQTT_PASSWORD |
(none) | MQTT broker password |
SPAXEL_MQTT_PREFIX |
spaxel |
MQTT topic prefix |
SPAXEL_MQTT_CLIENT_ID |
spaxel-<install_id> |
MQTT client ID |
TZ |
UTC |
Timezone for diurnal baselines, morning briefings, quiet hours, auto-update scheduling. Use IANA tz names (e.g., America/New_York, Europe/London) |
SPAXEL_REPLAY_MAX_MB |
360 |
Maximum size of the CSI replay buffer in MB (48h at 8 nodes / 20 Hz) |
SPAXEL_REPLAY_RETAIN_H |
48 |
CSI replay retention in hours. Eviction is size-based (REPLAY_MAX_MB), this is advisory |
SPAXEL_MAX_DASHBOARD_CLIENTS |
10 |
Maximum concurrent dashboard WebSocket clients |
SPAXEL_NODE_STALE_S |
15 |
Seconds since last health report before a connected node is marked STALE |
SPAXEL_LOG_LEVEL |
info |
Log level: debug, info, warn, error |
SPAXEL_SKIP_MIGRATIONS |
false |
Set to true to skip automatic schema migrations (advanced; for manual migration management) |
SPAXEL_FUSION_RATE_HZ |
10 |
Fusion loop rate in Hz. Reduce for lower CPU use; increase for smoother tracking (max 20) |
SPAXEL_GRID_CELL_M |
0.2 |
Fresnel zone accumulation grid cell size in meters |
SPAXEL_MDNS_NAME |
spaxel |
mDNS service name advertised to nodes. Must match firmware ms_mdns NVS key |
SPAXEL_NTP_SERVER |
pool.ntp.org |
NTP server hostname embedded in the provisioning payload. Nodes use this for clock synchronization for TX stagger slots. Set to a local NTP server (e.g., router IP) for networks without internet access |
SPAXEL_MDNS_ENABLED |
true |
Set to false to disable mDNS advertisement (e.g., when using Docker bridge networking instead of network_mode: host). Nodes must then use the cached ms_ip NVS key or captive portal IP entry for mothership discovery |
Dockerfile
Multi-stage build. SQLite is accessed via the pure-Go modernc.org/sqlite driver (no CGO, no gcc needed in the final image). This keeps the image small and enables linux/amd64 + linux/arm64 builds without cross-compilation complexity.
# Stage 1: Build the Go binary
FROM golang:1.23-bookworm AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
# CGO_ENABLED=0 because modernc.org/sqlite is pure Go
RUN CGO_ENABLED=0 GOOS=linux go build \
-ldflags="-s -w -X main.version=$(cat VERSION)" \
-o spaxel ./cmd/mothership
# Stage 2: Minimal runtime image
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /app/spaxel /spaxel
# Embed the dashboard static files at build time
COPY --from=builder /app/dashboard /dashboard
# Include a bundled firmware binary (users can override with a volume mount)
COPY --from=builder /app/firmware/dist/*.bin /firmware/
EXPOSE 8080
VOLUME ["/data"]
ENTRYPOINT ["/spaxel"]
Multi-arch build (CI):
docker buildx build --platform linux/amd64,linux/arm64 \
-t ghcr.io/spaxel/spaxel:$(cat VERSION) \
-t ghcr.io/spaxel/spaxel:latest \
--push .
Key design decisions:
distroless/static-debian12:nonroot— no shell, no package manager, runs as non-root (UID 65532). Minimal attack surface.modernc.org/sqlite— pure Go SQLite; avoids CGO complexities for multi-arch cross-compilation. Performance is ~20% slower than cgo/mattn but fully adequate for this workload./dashboard— the entire dashboard (HTML, JS, Three.js, CSS) is embedded in the binary via//go:embed dashboard/*. No volume mount needed for the UI. Updating the UI requires a new Docker image./firmwareis a COPY from the build stage (bundled default) but is overridable by the user's volume mount (volume takes precedence over COPY content via Docker overlay semantics — actually requires mounting the firmware dir).
Note on SQLite driver: modernc.org/sqlite maps to the sqlite3 database/sql driver name. All sql.Open() calls use "sqlite" (not "sqlite3"). Replace with mattn/go-sqlite3 if CGO performance becomes necessary (requires build-stage apt-get install gcc).
Docker Compose
Quickstart (single command, no Traefik):
docker run -d --name spaxel \
-p 8080:8080 \
-v spaxel-data:/data \
-v ./firmware:/firmware \
-e TZ=America/New_York \
ghcr.io/spaxel/spaxel:latest
# Then open http://<server-ip>:8080 — PIN setup page appears on first run
Production docker-compose.yml:
services:
spaxel:
image: ghcr.io/spaxel/spaxel:latest # pin to a specific version in production
# IMPORTANT: network_mode: host is REQUIRED for mDNS to work.
# mDNS uses multicast address 224.0.0.251 (link-local), which Docker bridge networking blocks.
# With host networking, the container shares the host's network interfaces and mDNS multicasts
# reach the LAN where ESP32 nodes can receive them.
# Side effect: 'ports' mapping is ignored in host mode — the port 8080 is directly exposed.
network_mode: host
# ports: # Not used with network_mode: host
# - "8080:8080"
#
# Alternative (if host mode is not desired): disable mDNS and require nodes to use
# the cached ms_ip NVS key (manual IP entry during captive portal provisioning).
# Set SPAXEL_MDNS_ENABLED=false to skip the mDNS advertisement entirely.
volumes:
- spaxel-data:/data # SQLite, baselines, floor plans, CSI recording buffer
- ./firmware:/firmware # Firmware binaries for OTA (pre-populate before first run)
environment:
TZ: America/New_York # Required for correct diurnal baseline hours and briefing times
SPAXEL_MQTT_BROKER: mqtt://homeassistant.local:1883 # Optional; remove line if no MQTT
# SPAXEL_MQTT_USERNAME: mosquitto
# SPAXEL_MQTT_PASSWORD: secret
# SPAXEL_REPLAY_MAX_MB: "720" # 96h replay for larger installs
# SPAXEL_LOG_LEVEL: debug # Uncomment for troubleshooting
restart: unless-stopped
stop_grace_period: 35s # Allows full 30s graceful shutdown
ulimits:
nofile:
soft: 4096 # One fd per node connection + SQLite handles
hard: 8192
healthcheck:
test: ["CMD", "wget", "-q", "-O-", "http://localhost:8080/healthz"]
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
deploy:
resources:
limits:
memory: 512m # Increase to 1g for 16+ node fleets
cpus: "2.0"
reservations:
memory: 128m
cpus: "0.5"
labels:
- "traefik.enable=true"
- "traefik.http.routers.spaxel.rule=Host(`spaxel.example.com`)"
- "traefik.http.routers.spaxel.entrypoints=websecure"
- "traefik.http.routers.spaxel.tls.certresolver=letsencrypt"
- "traefik.http.services.spaxel.loadbalancer.server.port=8080"
# Extend Traefik timeout for long-lived node WebSocket connections:
- "traefik.http.routers.spaxel.middlewares=spaxel-ws-timeout"
- "traefik.http.middlewares.spaxel-ws-timeout.headers.respondingTimeouts.readTimeout=3600s"
volumes:
spaxel-data:
driver: local
First-run steps:
- Create the firmware directory and copy the initial firmware binary:
mkdir -p ./firmware && cp path/to/spaxel-1.0.0.bin ./firmware/ - Start:
docker compose up -d - Open
http://<server-ip>:8080— the PIN setup page appears (no auth required for this first-run step only) - Set your dashboard PIN → redirected to the onboarding wizard
- Connect an ESP32-S3 via USB, click "Add Node" — Web Serial provisioning begins
Data backup:
# Manual backup before upgrade or for offsite storage
docker exec spaxel wget -qO- http://localhost:8080/api/backup > spaxel-backup-$(date +%Y%m%d).zip
# Or directly from volume:
docker run --rm -v spaxel-data:/data alpine \
tar czf - /data/spaxel.db /data/floorplan > spaxel-db-backup-$(date +%Y%m%d).tar.gz
Traefik WebSocket notes:
- Traefik supports WebSocket natively — no special middleware needed. It detects the
Upgrade: websocketheader and proxies the connection transparently - The
respondingTimeouts.readTimeoutmiddleware label above extends the default Traefik read timeout so long-lived node WebSocket connections (which may be quiet during idle periods) are not killed - ESP32 nodes connect to
ws://spaxel.example.com/ws/node(orwss://with TLS) — Traefik routes to the container
Firmware Build System
ESP-IDF version: 5.2.x (stable). Do not use 5.0 or 5.1 — the CSI callback API changed at 5.2. Pin the version in CI: idf_version: "5.2.3".
Project structure:
firmware/
main/
main.c — app_main(), startup sequencing, task creation
wifi.c / wifi.h — WiFi station connect, mDNS, captive portal AP
csi.c / csi.h — promiscuous mode, CSI callback, binary frame serialization
ws.c / ws.h — WebSocket client (esp_websocket_client), JSON/binary framing
ble.c / ble.h — BLE passive scan (esp_bt), advertisement parsing, rotation heuristic
ota.c / ota.h — OTA download, SHA-256 verification, esp_ota_ops
nvs.c / nvs.h — NVS read/write helpers, schema migration, provisioning
serial_prov.c — 10-second serial provisioning window, UART JSON handler
sntp.c / sntp.h — SNTP init, sync wait, resync timer
led.c / led.h — LED control (identify blink, OTA progress, status)
CMakeLists.txt
CMakeLists.txt
partitions.csv — factory(4MB) + ota_0(4MB) + ota_1(4MB) + nvs(24KB) + otadata(8KB)
sdkconfig.defaults — project-specific sdkconfig overrides (committed to repo)
Required sdkconfig.defaults settings:
# WiFi
CONFIG_ESP32S3_SPIRAM_SUPPORT=y # enable PSRAM for ring buffer headroom
CONFIG_ESP_WIFI_PROMISCUOUS_FILTER=y # required for CSI capture
CONFIG_ESP_WIFI_CSI_ENABLED=y # enable CSI API
CONFIG_ESP_WIFI_STATIC_RX_BUFFER_NUM=16 # increase RX buffers for high CSI rate
CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM=32
# BLE (Bluetooth)
CONFIG_BT_ENABLED=y
CONFIG_BT_BLE_ENABLED=y
CONFIG_BT_BLE_42_FEATURES_SUPPORTED=y
CONFIG_ESP_COEX_SW_COEXIST_ENABLE=y # WiFi+BLE coexistence (mandatory for dual-radio)
CONFIG_ESP_COEX_POWER_MANAGEMENT=y
# OTA
CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y # dual-partition rollback
CONFIG_OTA_ALLOW_HTTP=y # allow HTTP OTA URLs (not HTTPS-only)
# NVS encryption: disabled by default (home use; users can enable manually)
CONFIG_NVS_ENCRYPTION=n
# Flash & partition
CONFIG_ESPTOOLPY_FLASHSIZE_16MB=y
CONFIG_PARTITION_TABLE_CUSTOM=y
CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="partitions.csv"
# App version (set per release)
CONFIG_APP_PROJECT_VER="1.0.0"
CONFIG_APP_PROJECT_VER_FROM_CONFIG=y
# Stack sizes (CSI callback runs in a high-priority task)
CONFIG_ESP_MAIN_TASK_STACK_SIZE=8192
CONFIG_PTHREAD_TASK_STACK_SIZE_DEFAULT=4096
# Logging (INFO in release builds, DEBUG when debug NVS key = 1)
CONFIG_LOG_DEFAULT_LEVEL_INFO=y
CONFIG_LOG_MAXIMUM_LEVEL_DEBUG=y
Task architecture (FreeRTOS):
| Task | Core | Priority | Stack | Responsibility |
|---|---|---|---|---|
app_main |
1 | 1 | 8 KB | Startup sequencing, WiFi/WS lifecycle |
ws_task |
1 | 5 | 8 KB | WebSocket send/receive loop |
csi_task |
1 | 10 | 4 KB | CSI callback → binary frame serialization → WS queue |
ble_scan_task |
0 | 3 | 4 KB | BLE passive scan, advertisement parsing, RSSI aggregation |
health_task |
0 | 2 | 2 KB | Periodic health JSON assembly and queuing (every 10 s) |
CSI callback fires at up to 50 Hz; it serializes the frame into a binary buffer and posts to the ws_send_queue (depth 32) without blocking. The ws_task drains the queue. If the queue is full, the frame is silently dropped (hardware-rate CSI is best-effort).
Build & flash commands:
# One-time setup
. $IDF_PATH/export.sh
# Build
idf.py -C firmware build
# Flash (manufacturing / initial install to factory partition)
idf.py -C firmware -p /dev/ttyUSB0 flash
# Or via esptool directly (used by esptool-js in the dashboard):
esptool.py --port /dev/ttyUSB0 --baud 921600 write_flash \
0x10000 firmware/build/spaxel.bin
# Generate release binary (same as OTA artifact):
cp firmware/build/spaxel.bin spaxel-$(cat firmware/VERSION).bin
CI/CD: GitHub Actions workflow builds spaxel.bin and attaches it to a GitHub Release. The mothership Docker image includes a COPY firmware/spaxel-*.bin /firmware/ step so the latest firmware is bundled in the container image (users can override with their own /firmware/ volume mount).
Node Hardware
- Recommended: ESP32-S3-DevKitC-1 (N16R8 variant — 16 MB flash for OTA dual-partition, 8 MB PSRAM)
- Minimum: Any ESP32-S3 board with external antenna connector
- Antenna: External 2.4 GHz antenna recommended for consistent CSI (onboard PCB antenna works but with higher variance)
- Power: USB-C (5V) — standard phone charger. Consider PoE splitters for ceiling-mounted nodes
- Enclosure: 3D-printed or off-the-shelf project box. Mount with adhesive or screws
Recommended Deployment
- Quickstart (passive radar): 2 ESP32 nodes + existing WiFi router. Nodes in RX-only mode. Presence detection in the area between nodes and router
- Minimum viable: 4 nodes in a single room, corners at mixed heights (2 high, 2 low). Can mix passive radar + dedicated TX
- Good coverage: 6–8 nodes across an apartment, perimeter placement, angular diversity
- Node density: ~1 per 50–70 m² for presence, ~1 per 15–25 m² for localization
- Placement rules: Non-collinear, avoid all-same-height, keep LoS between at least some pairs
Testing Strategy
Go Unit Tests
Each algorithmic module has a companion _test.go file. Tests are table-driven and use only the standard library (testing package). No external test framework required.
Modules with mandatory unit tests:
| Package | Test file | What to test |
|---|---|---|
pipeline/phase |
phase_test.go |
Phase sanitization: given known I/Q pairs, verify unwrapping produces expected residual. Test NaN/Inf handling. Test near-zero denominator in OLS regression. |
pipeline/nbvi |
nbvi_test.go |
Welford update: verify online variance matches batch variance to 1e-9. Test NBVI threshold fallback (< 8 subcarriers passing). |
pipeline/feature |
feature_test.go |
deltaRMS: given known baseline and amplitude, verify result. EMA baseline update: verify motion-gating (no update when deltaRMS > threshold). |
localizer/fresnel |
fresnel_test.go |
Zone number computation: for known TX/RX/cell geometry, verify ceil(ΔL/(λ/2)). Zone decay: verify zone_decay(n) = 1/n^2 for decay_rate=2. |
localizer/ukf |
ukf_test.go |
Constant-velocity prediction: verify predict-only step matches analytical solution. Measurement update: verify state converges toward known position. Biomechanical clamp: verify XY speed is clamped to 2.0 m/s. |
localizer/gdop |
gdop_test.go |
Fisher matrix: given 2 orthogonal links, verify GDOP = sqrt(2). Collinear links: verify GDOP = Infinity. |
portal |
portal_test.go |
Crossing detection: verify sign-change + velocity threshold logic. Velocity-too-low: verify no tentative crossing registered. Count floor: verify count cannot go below 0. |
ble |
ble_test.go |
BLE centroid: given known node positions and RSSI values, verify pos_ble within 0.01 m of analytical centroid. Address rotation scoring: verify score > 0.7 for matching mfr data + same RSSI node. |
anomaly |
anomaly_test.go |
Welford update: after N identical observations, verify mean = observation and variance = 0. z_score + normalize: verify correct [0,1] mapping at 1σ, 2σ, 4σ. |
replay |
replay_test.go |
File header read/write round-trip. Seek to known timestamp: verify returned frame has recv_time_ms ≥ target. Corruption recovery: truncated final frame → truncated cleanly. |
auth |
auth_test.go |
HMAC token derivation: same inputs produce same token. Session creation/expiry. bcrypt round-trip for PIN. |
Test data strategy: All numerical tests use deterministic synthetic data (no random seeds in test paths). The Fresnel zone and UKF tests use hard-coded 2D geometries with analytically known answers.
Integration Tests (using CSI simulator)
Located in test/integration/. Each test:
- Starts a mothership in a Docker container (or in-process for unit-level integration)
- Runs
spaxel-simwith specific walker configurations - Polls
GET /api/blobsand/api/eventsto assert outcomes
Mandatory integration test scenarios:
| Scenario | Simulator config | Assertion |
|---|---|---|
| Single node, single walker | 2 nodes, 1 walker, 60 s | blob count > 0 for > 80% of time |
| Multi-node localization | 4 nodes, 1 walker, 60 s | blob position within 1.5 m of walker position |
| Idle-to-active rate change | 4 nodes, 0 walkers → 1 walker after 10 s | node rate increases after walker appears |
| Node disconnect + reconnect | 4 nodes, disconnect one mid-test | system continues producing blobs; node returns to fleet |
| Portal crossing | 2 nodes, walker crosses portal | portal_crossings table has 1 row |
| OTA rollback | Push invalid firmware | node reconnects with original version |
| Auth rejection | Connect without token | connection closed with HTTP 401 |
Firmware Tests (host-based unit tests)
ESP-IDF supports host-based testing via idf.py test --target linux. The following firmware modules have host tests:
nvs— NVS schema migration: simulate schema_ver=0→1 upgradecsi— Binary frame serialization: verify frame header fields and little-endian encodingserial_prov— Provisioning JSON parser: verify valid JSON parsed correctly; invalid JSON returns{"ok":false}
Property-Based / Fuzz Tests
The following are high-value fuzz targets — any malformed input here has an outsized impact:
| Target | What to fuzz | Tool |
|---|---|---|
ingestion.ParseBinaryFrame |
Random byte slices 0–300 bytes | go test -fuzz=FuzzParseBinaryFrame ./internal/ingestion/ |
ingestion.ParseJSONFrame |
Random UTF-8 strings up to 4096 bytes | go test -fuzz=FuzzParseJSONFrame |
pipeline/phase.Sanitize |
Edge I/Q pairs: all-zero, max int8, alternating sign | Table-driven property test: output is always finite (no NaN/Inf) |
replay.SeekToTimestamp |
Target timestamps before/after file bounds, at wrap points | Fuzz with arbitrary int64 timestamps |
auth.VerifyToken |
Tokens with wrong length, invalid hex, correct length but wrong bytes | Property: VerifyToken never panics |
Fuzz targets are in *_fuzz_test.go files and must be run with go test -fuzz — they are excluded from the regular go test ./... run to avoid indefinite CI loops. A 60-second fuzz run is added as an optional CI step.
Quality Gates / Definition of Done
We do not ship a version if any of the following fail:
go test ./...— all unit tests passgo vet ./...— no vet warningsgolangci-lint run— no lint errors (at least:errcheck,staticcheck,gosimple)docker buildx build --platform linux/amd64,linux/arm64 .— multi-arch build succeeds- Integration test suite:
spaxel-sim --nodes 4 --walkers 1 --duration 30swith blob count >0 - Integration test: OTA rollback test (invalid firmware → node reverts)
- Integration test: auth rejection test (node without token → HTTP 401)
axe-coreaccessibility CI gate passes on dashboard HTML- Pipeline timing: fusion loop median <15 ms over a 60-second run (measured by the
timing_budget_test.gobenchmark)
Advisory (tracked but not blocking):
- Fuzz 60-second runs for binary frame and JSON parsers (run on release branches, not every commit)
go tool pprofheap snapshot during 8-node sim run: baseline heap <80 MB
Open Questions
These are unresolved design questions. Each is tagged with the earliest phase where a decision is needed.
- 5 GHz support (Phase 1+ — monitor): ESP32-S3 is 2.4 GHz only. Future ESP32-C6 or C5 may add 5 GHz with different CSI characteristics. Design the pipeline to be frequency-agnostic where possible (parameterize λ = c/f).
- Node self-positioning (Phase 3 — defer): MDS-MAP from pairwise ToF could eliminate manual position entry. Feasibility with ESP32 ToF resolution (~7.5 m) is questionable — defer to a future phase. Until then, manual positioning via the 3D editor is the only path.
- IEEE 802.11bf (monitor — no action until ESP32 support ships): The sensing standard (approved May 2025) provides standardized sensing frames that could replace promiscuous mode CSI capture entirely. Monitor ESP-IDF release notes for support. If added, it will be a firmware-layer change only.
- Multi-installation coordination (out of scope — see Non-Goals): Could multiple Spaxel instances in adjacent apartments share boundary link data? Deferred — privacy and network topology implications need thought. Not a blocker for any current phase.
- Multi-installation coordination: Could multiple Spaxel instances in adjacent apartments share boundary link data to improve wall-adjacent detection? Deferred — privacy and network topology implications need thought