Previously the ready signal was sent exactly once at window open, so the
browser had to have the serial port open at that exact millisecond after
reboot. Now the firmware broadcasts SPAXEL READY every second for the
full provisioning window (2 min fresh / 15 s reprov), giving the host
ample time to open the port and catch the handshake.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The no-docker path created the cmd but never called cmd.Start(), so
waitForMothership always timed out. Add Start(), stdout/stderr wiring,
SPAXEL_BIND_ADDR, and SPAXEL_MDNS_ENABLED=false for CI headless operation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 0.1.352 Docker image contained firmware compiled with CONFIG_ESPTOOLPY_FLASHSIZE=16MB
despite sdkconfig.defaults being updated to 4MB in d837598. Kaniko served a cached
firmware layer, bypassing the sdkconfig.defaults change.
Result: ESP32-S3 (4MB flash) flashed via Web Serial crashed on every boot:
spi_flash: Detected size(4096k) smaller than binary image header(16384k). Probe failed.
Fix:
- Add FIRMWARE_CACHE_BUST ARG before COPY in firmware stage (guarantees cache miss)
- Add RUN rm -f sdkconfig sdkconfig.old so idf.py set-target regenerates from
sdkconfig.defaults (CONFIG_ESPTOOLPY_FLASHSIZE_4MB=y) on every build
Bumps version to 0.1.354 to trigger a fresh CI build.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
serveEmbeddedFile asserted the embedded file implemented
`interface { Len() int64; io.ReadSeeker }`, but *embed.openFile has no
Len() method, so http.ServeContent panicked (caught by chi Recoverer ->
500) on every embedded page: /fleet, /ambient, /live, /setup, /simple, /.
http.ServeContent only needs an io.ReadSeeker, which embed files satisfy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Explicit VERSION bump so resolve-version skips its auto-bump git push
(which was racing with the e2e workflow), letting docker-build run with
the GOOS/GOARCH fix and push 0.1.352.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The multi-arch change (2cd4410) derived GOOS/GOARCH from TARGETPLATFORM
with wrong cut field indices (-f2/-f3), yielding the invalid pair
amd64/amd64 -> `go: unsupported GOOS/GOARCH pair amd64/amd64`, failing
every CI image build since May 24. CI builds amd64 only (ESP-IDF firmware
is x86_64-only), so pin linux/amd64 explicitly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empty commit to fire the github webhook -> spaxel-build, so the failing
build step can be observed before podGC deletes the pod.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pins a deterministic tag for the image that includes the simulator binary
(added in 3ca6e8f), avoiding ambiguity with any in-flight 0.1.347 build
that predates the Dockerfile change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds cmd/sim alongside the mothership and copies /spaxel-sim into the
final image so the same image can drive a synthetic-node CSI load against
a deployed mothership. Default ENTRYPOINT still runs the mothership.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix timezone mismatch between local time formatting and SQLite's UTC-based
date() function. AggregateDaily, GetWeeklyTrend, and GetAllWeeklyTrends now
use .UTC() before formatting dates to match SQLite's date(timestamp, 'unixepoch').
Closes: bf-26eg
Tests: TestHealthStore_DailyAggregation, TestHealthStore_GetAllWeeklyTrends now pass
- Fix SQL syntax errors caused by //nolint:errcheck comments inside raw string literals in anomaly.go
- Fix TestFlowAccumulator_DwellWhilePaused by establishing last waypoint before pausing
- Fix predictor_test.go compilation errors:
- Remove unused \"os\" import
- Replace AddTransitionSample with RecordTransition using ZoneTransition struct
- Fix map literal with missing key name
All tests now pass.
Closes: spaxel-test-fixes
- Convert pretty-printed JSON to proper line-delimited JSONL
- Fix bf-awtza bead: set closed_at for closed status
- Enables br sync --import-only to rebuild database
Implement IO-1 (Fresh install / first boot) and IO-2 (Idempotent restart)
acceptance tests for the hardware-free install and onboarding journey.
IO-1 validates:
- Fresh install starts with empty data volume
- First-run setup is accessible before PIN configuration
- PIN setup completes successfully
- Migrations run (detected in logs)
- PIN persists after setup
- Health check returns green
- No nodes are attached on fresh install
IO-2 validates:
- Configured install (PIN, node, zone) persists across restart
- Same data directory is reused after restart
- No re-setup prompt appears after restart
- Node label and position persist correctly
- Zone configuration persists correctly
- Mothership remains healthy after restart
These tests complete the IO-1..IO-11 acceptance test suite as specified
in docs/plan/plan.md, enabling hardware-free CI validation of the
installation and onboarding journey.
Closes: bf-2hi0h
Implements acceptance tests for failure scenarios and edge cases during
node onboarding per the plan specification:
- IO-7: Provisioning timeout - node that goes silent is marked offline
within heartbeat window (60s) and surfaced in /api/fleet; no crash
- IO-8: Bad/expired token - invalid token rejected with clear error;
node never enters fleet; no zombie row
- IO-9: Duplicate MAC - second connection with same MAC handled
(disconnects first or rejects second); no duplicate rows
- IO-10: Drop mid-onboard - killing simulator during onboarding leaves
node re-onboardable; no half-provisioned lock
- IO-11: Firmware-version skew - old firmware nodes onboard successfully
and OTA can be initiated
Tests use the acceptance harness with spaxel-sim and verify proper
handling of each scenario without mothership crashes or data corruption.
Closes: bf-1922s
Implements IO-6: Full new-user E2E (happy path) — HARD GATE.
The test verifies the complete onboarding journey from fresh install
to live events:
1. Fresh install + PIN setup
2. 6-node fleet onboarding via spaxel-sim
3. Define 2 zones + 1 portal
4. Run walker simulation
5. Verify blob detection, zone-presence events, portal-crossing
events, timeline entries, and MQTT/HA integration status
Added helper methods to TestHarness:
- CreateZone, CreatePortal, GetPortalCrossings
- GetTimeline, GetMQTTStatus
Closes: bf-1rifr
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implement acceptance tests for single-node and multi-node fleet onboarding:
- IO-3 (TestIO3_SingleNodeOnboarding): Validates end-to-end onboarding of a
single simulated ESP32 node via spaxel-sim. Verifies node transitions from
discovered to online, appears in /api/nodes within 10s, and that label/position
assignments persist via REST API (PUT /api/nodes/{mac}/position and PATCH
/api/nodes/{mac}/label).
- IO-4 (TestIO4_MultiNodeFleetBringup): Validates multi-node fleet bring-up with
6 nodes. Verifies all nodes reach online status, no TX-slot collision warnings
in logs, /api/nodes shows all 6 online, and fleet telemetry data is available
via /api/fleet endpoint.
Also fixes a context leak in TestIO5_DeviceIdentityBLEOnboarding by ensuring
the simulator context cancel function is called with defer.
Closes: bf-4jcjg
Quality gate #7: Verify nodes without valid tokens are rejected with HTTP 401.
- Created as7_auth_reject_test.go following AS1-AS6 pattern
- Tests WebSocket connection without X-Spaxel-Token header
- Verifies HTTP 401 Unauthorized response
- Validates simulator exits non-zero with invalid token
- Confirms no zombie nodes in fleet after rejection
- Registered AS7_AuthRejectIntegration in test runner
Closes: bf-2d9fj
Add migration_018 that creates all 8 prediction subsystem tables in the main
database, consolidating the separate prediction.db and prediction_accuracy.db
files. Add NewModelStoreWithDB and NewAccuracyTrackerWithDB constructors that
accept an existing *sql.DB connection, and update main.go to use the main
database connection instead of separate files.
- Added migration_018_add_prediction_tables with all prediction tables
- Added NewModelStoreWithDB() to accept shared DB connection
- Added NewAccuracyTrackerWithDB() to accept shared DB connection
- Updated Close() to only close DB when we own it (path != "")
- Updated main.go prediction init to use mainDB
The legacy NewModelStore() and NewAccuracyTracker() constructors remain for
backward compatibility (tests continue using their own migrations).
Closes: bf-38wcp
- Use chi.Walk instead of len/range on chi.Routes interface
- chi.Routes returns an interface type, not a slice, which cannot
be used with len() or range directly
- chi.Walk is the proper API for iterating registered routes
The test verifies that both Handler and FleetHandler can be registered
on the same router without chi panicking on duplicate routes.
Closes: bf-3o15x
The simulator API was registered at /simulator but the startup log
claimed /api/simulator. Fixed the route registration to match the
log and align with the REST API pattern (all endpoints under /api/).
The dashboard page route (/simulator serving simulator.html) remains
unchanged - only the API endpoint path was fixed.
Closes: bf-1f55j
- Add flag.Parse() call in TestMain to initialize testing flags before
running tests, fixing panic when test functions call testing.Short()
- Add proper PASS/FAIL reporting for each test in the sequence
- Apply go fmt formatting to io_install_upgrade_test.go
The IO-7..IO-11 failure and edge onboarding tests were already
implemented in the codebase. This fix ensures they can run properly
by initializing the testing framework before calling test functions.
Closes: bf-1922s
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements IO-5 acceptance test which verifies:
- A person can be created via POST /api/people
- A simulated BLE device (from spaxel-sim) is discovered
- The BLE device can be assigned to a person via PUT /api/ble/devices/{mac}
- The device registration is persisted correctly with person_id, person_name, and person_color
Also fixes a bug in mothership/cmd/mothership/main.go where
SetBriefingProvider was called before dashboardHub was initialized,
causing a nil pointer dereference on startup. The call is now
made after the hub is created.
Closes: bf-3cagn (IO-5: BLE device-identity onboarding test)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements plan §Disk Full Handling:
- Created internal/diskspace package with monitor goroutine (60s poll interval)
- At <100 MB free: stop CSI replay buffer writes, emit system alert event
- At <20 MB free: also pause crowd flow accumulation and prediction updates
- Detection and localization continue regardless of disk state
- Added /api/diskspace/stats endpoint for dashboard integration
Changes:
- internal/diskspace/monitor.go: Core monitor with state machine (normal/warning/critical)
- internal/diskspace/monitor_test.go: Unit tests for pause/resume behavior
- internal/recorder/manager.go: Added PauseWrites/ResumeWrites/IsPaused methods
- internal/recorder/manager_test.go: Tests for paused frame dropping
- internal/analytics/flow.go: Added PauseWrites/ResumeWrites/IsPaused methods
- internal/analytics/flow_test.go: Tests for paused trajectory/dwell accumulation
- internal/prediction/predictor.go: Added PauseUpdates/ResumeUpdates/IsPaused methods
- internal/prediction/predictor_test.go: Tests for paused prediction updates
- cmd/mothership/main.go: Integrated monitor initialization and API endpoint
All writes are no-ops while paused, with automatic resume when space recovers.
Closes: bf-4jb0a
Add support for 'predicted_enter' trigger condition that fires when a
prediction indicates a person is likely to enter a zone within a configured
time window (default 30 minutes). Uses rising-edge detection with 60-minute
cooldown per person-zone combination.
Changes:
- Add migration_017 to expand triggers table CHECK constraint to include
'predicted_enter' (SQLite table recreation required)
- Update volume store init() for new databases with expanded constraint
- Add predicted_enter to API validation in volume_triggers.go
- Implement evaluatePredictedEnter() in volume store with rising-edge
detection and cooldown tracking
- Add PredictionProvider interface and SetPredictionProvider() methods
to both volume.Store and automation.Engine
- Wire predicted_enter evaluation into 10 Hz fusion tick pipeline
Closes: bf-20sp3
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds three test cases for quality gate #7:
1. TestAS7_AuthRejectMissingToken - verifies nodes without tokens are rejected
2. TestAS7_AuthRejectInvalidToken - verifies nodes with invalid tokens are rejected
3. TestAS7_AuthAcceptValidToken - verifies nodes with valid tokens are accepted
Each test verifies:
- Simulator exits non-zero on rejection
- Mothership logs the rejection
- No nodes connect when auth fails
Closes: bf-2d9fj
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Change runtime from debian:12-slim to gcr.io/distroless/static-debian12:nonroot
- Remove wget health check (distroless has no shell)
- Embed dashboard via go:embed (dashboard files now part of binary)
- Add build tag support for conditional embedding (production vs development)
- Dashboard serving code supports both embedded and filesystem-based serving
The dashboard is now embedded in the Go binary using go:embed with the
'embed' build tag. Production Docker builds use -tags=embed to enable
dashboard embedding, while development builds fall back to filesystem
serving. This aligns with the plan's security requirements for non-root
distroless runtime while maintaining developer ergonomics.
Closes: bf-1chgr
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add ARG TARGETPLATFORM/TARGETARCH for cross-platform builds
- Cross-compile Go binary using GOOS/GOARCH from TARGETPLATFORM
- ESP32 firmware build is amd64-only (ESP-IDF is x86_64)
- Creates placeholder on arm64 builds
- Removes placeholder in final stage, adds README
- Supports docker buildx --platform linux/amd64,linux/arm64
Closes: bf-2bxpx
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add IO-1 (Fresh install / first boot) and IO-2 (Idempotent restart & upgrade-in-place)
integration tests as hard-gate tests for releases. These tests validate the entire
new-user journey with zero physical hardware.
IO-1 validates:
- Dashboard serves on fresh install (200 OK)
- First-run PIN setup flow
- Migrations run and complete
- PIN persists across restart check
- Health endpoint returns green
- No nodes attached on fresh install
IO-2 validates:
- PIN configuration persists across restart
- Node registry persists across restart
- No re-setup prompt after restart
- Prior data is readable after restart
- Pre-upgrade DB backup exists
The tests use the existing TestHarness infrastructure and follow the plan's
Installation & Onboarding Test Plan scenarios.
Closes: bf-1r6ww
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add io_install_upgrade_test.go with IO-1 and IO-2 test scenarios
- IO-1: Fresh install / first boot
- Verifies mothership starts with empty volume
- Checks first-run setup page is served
- Validates PIN setup and persistence
- Confirms migrations run and health is green
- Ensures no nodes attached on fresh install
- IO-2: Idempotent restart & upgrade-in-place
- Verifies restart preserves PIN, nodes, and zones
- Checks no re-setup prompt after restart
- Validates data persists across restarts
- Confirms backup directory exists for upgrades
- Update integration_test.go TestMain to include IO tests
Closes: bf-dhlyk
The prediction subsystem previously created 8 tables at runtime without
version tracking in separate SQLite databases (prediction.db,
prediction_accuracy.db). This created schema drift issues where changes
were unversioned and difficult to track.
Changes:
- Add prediction_schema_version table to prediction.db (model.go)
- Add prediction_accuracy_schema_version table to prediction_accuracy.db (accuracy.go)
- Convert migrate() functions to use versioned migrations (version 1)
- All 8 tables now created through versioned migration system:
- zone_transitions_history, transition_probabilities, dwell_times, person_zone_entry
- recorded_predictions, accuracy_stats, zone_occupancy_patterns, zone_occupancy_history
Closes: bf-38wcp
- Remove duplicate node-specific routes (role, label, locate, delete) from
FleetHandler.RegisterRoutes to avoid chi panic on duplicate registration
- Keep only unique FleetHandler routes: /api/fleet/health, /api/fleet/history,
/api/fleet/optimise, /api/fleet/simulate
- Add startup smoke test TestRouteRegistrationNoPanic to verify both Handler
and FleetHandler can be registered on same router without panic
main.go registers both fleet.NewHandler and fleet.NewFleetHandler on the
same router, which previously caused chi to panic due to duplicate routes:
POST /api/nodes/{mac}/role
PATCH /api/nodes/{mac}/label
POST /api/nodes/{mac}/locate
DELETE /api/nodes/{mac}
The Handler has comprehensive node/room/mode endpoints while FleetHandler
focuses on health/optimization/simulation, so duplicates are removed from
FleetHandler.
Closes: bf-3o15x
Detailed IO-1..IO-11 scenarios validating the full new-user journey (fresh install ->
first-run PIN setup -> device onboarding -> operational) entirely via the spaxel-sim
ESP32 simulator, hardware-free and deterministic in CI. IO-1/3/4/6 are release hard-gates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Delete compiled Go binaries (sim, spaxel-sim, cmd/sim/spaxel-sim, mothership/{sim,spaxel-sim},
*.test, acceptance.test) and the tracked dashboard/node_modules/ (6689 files) that were
polluting the repo. Add .gitignore rules so they stay out. Dashboard deps regenerate via npm ci.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>