The borrow-of-moved-value error for `state` was already fixed in the codebase.
Line 568 uses `.with_state(state.clone())` and `UnifiedState` derives Clone.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The plan §12 previously specified tests/ at root with integration/
and chaos/ subdirectories. However, the actual implementation uses
the idiomatic Rust convention with tests in crates/*/tests/.
This commit:
- Updates plan §12 repository structure to document the actual layout
- Moves tests/benches/score-comparability to docs/research/ (research artifacts)
- Removes the now-empty tests/ directory
CI already runs cargo test --all --all-features which correctly
discovers and runs all crate-level integration tests.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements the 14-table task-store schema from plan §4 with both SQLite
and Redis backends. Every §13 advanced capability and §14 HA mode consumes
one or more of these tables, so settling the schema now prevents per-feature
bespoke persistence.
## SQLite Backend (rusqlite)
- All 14 tables created idempotently at startup via migrations
- Schema version tracking with validation (rejects store ahead of binary)
- WAL mode + 5s busy_timeout for concurrent access
- Full TaskStore trait implementation with comprehensive tests
- Property tests for (insert, get) round-trip and (upsert, list) semantics
- Restart resilience test: tasks survive pod restart simulation
## Redis Backend (async via tokio)
- Mirrors the same 14-table API as SQLite (TaskStore trait)
- Keyspace mapping per plan §4 "Redis mode (HA)"
- Uses _index secondary sets for O(cardinality) list-wide queries (no SCAN)
- TTL-based auto-expiration for sessions, idempotency, rate-limits
- Leader election via SET NX EX with heartbeat renewal
- Pub/Sub for instant admin session revocation propagation
- CDC overflow buffer bounded by byte budget with auto-trim
- Rate limiting for search UI and admin login with exponential backoff
- Search UI scoped-key rotation coordination
## Schema Migrations
- 001_initial.sql: Tables 1-7 (tasks, node_settings_version, aliases,
sessions, idempotency_cache, jobs, leader_lease)
- 002_feature_tables.sql: Tables 8-14 (canaries, canary_runs, cdc_cursors,
tenant_map, rollover_policies, search_ui_config, admin_sessions)
- 003_task_registry_fields.sql: No-op (node_errors already present)
## Tests
- SQLite: 36 tests passing (unit + property + restart resilience)
- Redis: Integration tests using testcontainers (25+ async tests)
- Helm schema validation: enforces replicas > 1 + taskStore.backend: redis
## Definition of Done
✓ rusqlite-backed store with idempotent migrations
✓ Redis-backed store mirroring the same API (trait TaskStore)
✓ Migrations/versioning with schema version validation
✓ Property tests on SQLite backend (7 proptests passing)
✓ Integration test: task survives restart (task_survives_store_reopen)
✓ Redis-backend integration tests (testcontainers)
✓ miroir:tasks:_index-style iteration (no SCAN)
✓ Helm values.schema.json enforces replicas > 1 + redis requirement
✓ Redis memory accounting documented in plan §14.7
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The FromRef implementation for admin_endpoints::AppState was missing
the local_search_ui_rate_limiter field, causing a compilation error.
This completes P3.3.d Redis backend extras, which were already fully
implemented:
- Rate-limit keys with EXPIRE (miroir:ratelimit:searchui:<ip>,
miroir:ratelimit:adminlogin:<ip>, miroir:ratelimit:adminlogin:backoff:<ip>)
- Scoped-key coordination (miroir:search_ui_scoped_key:<index>,
miroir:search_ui_scoped_key_observed:<pod>:<index> with EXPIRE 60s)
- Pub/Sub for admin session revocation (miroir:admin_session:revoked)
- CDC overflow buffer (miroir:cdc:overflow:<sink> with LPUSH + LTRIM)
All acceptance criteria verified by existing tests:
- test_redis_rate_limit_searchui verifies EXPIRE is set
- test_redis_pubsub_session_invalidation verifies <100ms propagation
- test_redis_cdc_overflow verifies LLEN matches bytes published
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Section 15 Open Problem #6 was a one-line placeholder. Expand it with
current amd64-only state, the specific changes needed when arm64 is
prioritized (CI cross-compilation, multi-arch Docker, binary naming,
rust-toolchain target), and the trigger conditions for promotion.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add `miroir-ctl key rotate-node-master` command implementing plan §9
4-step zero-downtime rotation: create new admin-scoped key on all
Meilisearch nodes, print K8s Secret update instructions, wait for
rolling restart confirmation, delete old key. Supports --dry-run,
node auto-discovery via topology API, and rollback on step 1 failure.
Add `address` field to topology API NodeInfo for CLI node discovery.
Add runbooks for both nodeMasterKey (zero-downtime) and startup master
key (maintenance window required) rotation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Research complete: both score-based and RRF merge fail 0.95 threshold.
Updated research doc with full RRF validation results and confidence intervals.
Added benchmark result reports and helper tests. Follow-up bead miroir-n6v
created for global-IDF preflight (dfs_query_then_fetch pattern).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Research doc updated with precise 95% CIs per query type. compare.py
now computes and reports confidence intervals. Kendall τ = 0.79
(95% CI [0.7873, 0.8006]) confirms raw score merging is not viable;
RRF already implemented in merger.rs as mitigation. Follow-up bead
created (miroir-zfo) for RRF quality validation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Completed Plan §15 Open Problem #4 research on cross-shard score comparability.
## Key Finding
Average Kendall tau: 0.79 vs. 0.95 threshold — FAIL
Cross-shard score comparability is a significant issue:
- Common-term queries: τ = 0.15 (catastrophic)
- Local IDF statistics cause score inflation on small shards
- Documents from 10-doc shards outrank 93K-doc shard results
## Recommendation
Implement Reciprocal Rank Fusion (RRF) for result merging.
Follow-up bead: miroir-nsu
## Artifacts Added
- Benchmark infrastructure: tests/benches/score-comparability/
- Corpus generator with extreme shard skew (100× variance)
- Query generator (10K random queries across 5 types)
- BM25-based simulation with global vs local IDF
- Kendall tau comparison tool
- Full experimental results (τ = 0.79 ± 0.01, 95% CI)
- Research writeup: docs/research/score-normalization-at-scale.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Verified CI smoke pipeline runs end-to-end in ~5:39 on iad-ci.
All three checks pass: fmt, clippy, test.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add rrqlite/openraft+SQLite reference project, correct raft-rs status
to maintenance mode, note openraft 0.10 edition 2024 requirement, and
add additional production users (Helyim, RobustMQ, rrqlite).
Decision unchanged: do not ship Raft in v0.x or v1.0, revisit before v2.0.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add resharding load simulation model with real router hash functions
- Benchmark confirms storage amplification is exactly 2.0× and dual-write
amplification is exactly 2.0× across all test matrix scenarios (1KB/10GB,
10KB/100GB, 1MB/1TB), with hash distribution CV < 5% in all cases
- CLI window guard: resharding.allowed_windows config restricts resharding
to named time windows (e.g. "02:00-06:00 UTC"), CLI refuses outside
windows without --force
- Integration tests confirm rejection outside window, --force override,
no-restriction mode, and disabled config handling
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
14 chaos tests validate shard migration write safety at every cutover
boundary. Key findings:
- AE on + delta pass: 0/1M loss (production default)
- AE off + delta pass: 0/50K loss (delta pass is sufficient alone)
- AE off + delta skipped: ~2% loss → hard refusal at config validation
- 3-node cluster cutover: 0 loss with delta pass
Hard-coded policy: MigrationCoordinator refuses migrations when both
anti-entropy is disabled and delta pass is skipped. Warning logged when
AE is disabled but delta pass remains active.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds 14 chaos tests validating zero-data-loss at the migration cutover
boundary under all AE/delta-pass configurations. Two new 3-node cluster
variants exercise multi-owner shard migration with cross-node drain
tracking.
Key results: 0/1M loss with AE+delta; 0/50K loss with delta alone;
~2% hypothetical loss with neither (hard-refused by policy). The
MigrationCoordinator blocks migration when both anti-entropy and delta
pass are disabled.
Also includes: anti-entropy cross-module validation gate, warning log
when AE disabled during migration, empirical results table in
docs/trade-offs.md, and plan §15 OP#1 status update to verified.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Full serde-derived struct tree covering every block in plan §4 (MiroirConfig,
NodeConfig, TaskStoreConfig, AdminConfig, HealthConfig, ScatterConfig,
RebalancerConfig, ServerConfig, ConnectionPoolConfig, TaskRegistryConfig) and
all 21 §13 advanced-capability sub-structs (ReshardingConfig through
SearchUiConfig with nested auth/rate-limit/CSP/analytics structs), plus §14
horizontal-scaling structs (PeerDiscoveryConfig, LeaderElectionConfig, HpaConfig).
Includes:
- Layered loading via config crate: built-in defaults → file → env overrides
- Config::validate() with 14 cross-field rules (HA requires redis, scoped_key
timing inversion, node group bounds, tenant affinity range checks, etc.)
- 10 unit tests: round-trip YAML, full plan example, minimal YAML defaults,
and validation rejection cases
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Enumerates dump variants that streaming mode can/can't handle.
- Added docs/dump-import/compatibility-matrix.md with comprehensive
compatibility matrix covering Meilisearch versions, dump variants,
and workarounds
- Added docs/dump-import/README.md as entry point
- Updated miroir-ctl dump command to reference matrix with helpful
error messages for unimplemented subcommands (import, export, analyze)
Addresses Open Problem #5: identifies what "can't reconstruct" means
in concrete terms, giving operators clear guidance on when broadcast
fallback is needed and what alternatives exist.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Survey openraft, raft-rs, and async-raft crates. Design a Raft-backed
TaskStore prototype using openraft with SQLite state machine. Analytical
benchmark against Redis across latency, throughput, memory, and ops
complexity. Decision: revisit before v2.0, do not ship in v0.x/v1.0 —
Raft fails the decision gate (worse on write latency and correctness
maturity despite removing the Redis dependency).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- LICENSE: MIT (per plan §12)
- CHANGELOG.md: Keep a Changelog 1.1.0 skeleton with [Unreleased]
and [0.1.0] sections matching the awk extractor from plan §7
- .gitignore: Rust target/, editor junk; Cargo.lock kept in VCS
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>