jedarden/miroir

Author	SHA1	Message	Date
jedarden	02ad8fce9b	P11.7: Add quick-start example artifacts (Docker Compose + config) Adds the on-disk examples referenced by plan §11 "Quick start (local, Docker Compose)": - examples/docker-compose-dev.yml: 3 Meilisearch nodes + 1 Miroir orchestrator - examples/dev-config.yaml: Matching Miroir config (16 shards, RF=1) - examples/README.md: Comprehensive docs for running, troubleshooting, teardown - k8s/argo-workflows/miroir-ci-docker-compose-smoke.yaml: CI smoke tests The README.md quick start section already references these examples. Acceptance: ✅ docker-compose-dev.yml boots via docker compose up ✅ dev-config.yaml mounted into Miroir container ✅ examples/README.md documents usage and teardown ✅ CI smoke job exercises compose stack (health + index + search tests) ✅ README.md quick start points to examples/docker-compose-dev.yml Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Bead-Id: bf-3lad	2026-05-20 06:50:43 -04:00
jedarden	f20c1bae4d	bf-1p4v: Verify compile error already fixed The borrow-of-moved-value error for `state` was already fixed in the codebase. Line 568 uses `.with_state(state.clone())` and `UnifiedState` derives Clone. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 06:49:04 -04:00
jedarden	360378bde2	P11.8: Amend plan §12 to reflect Rust-idiomatic test layout The plan §12 previously specified tests/ at root with integration/ and chaos/ subdirectories. However, the actual implementation uses the idiomatic Rust convention with tests in crates/*/tests/. This commit: - Updates plan §12 repository structure to document the actual layout - Moves tests/benches/score-comparability to docs/research/ (research artifacts) - Removes the now-empty tests/ directory CI already runs cargo test --all --all-features which correctly discovers and runs all crate-level integration tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 06:49:04 -04:00
jedarden	9786a4217b	bf-35t4: Commit current main state before merge	2026-05-19 22:52:18 -04:00
jedarden	e5902bb47f	P3: Complete Phase 3 — Task Registry + Persistence (SQLite + Redis) Implements the 14-table task-store schema from plan §4 with both SQLite and Redis backends. Every §13 advanced capability and §14 HA mode consumes one or more of these tables, so settling the schema now prevents per-feature bespoke persistence. ## SQLite Backend (rusqlite) - All 14 tables created idempotently at startup via migrations - Schema version tracking with validation (rejects store ahead of binary) - WAL mode + 5s busy_timeout for concurrent access - Full TaskStore trait implementation with comprehensive tests - Property tests for (insert, get) round-trip and (upsert, list) semantics - Restart resilience test: tasks survive pod restart simulation ## Redis Backend (async via tokio) - Mirrors the same 14-table API as SQLite (TaskStore trait) - Keyspace mapping per plan §4 "Redis mode (HA)" - Uses _index secondary sets for O(cardinality) list-wide queries (no SCAN) - TTL-based auto-expiration for sessions, idempotency, rate-limits - Leader election via SET NX EX with heartbeat renewal - Pub/Sub for instant admin session revocation propagation - CDC overflow buffer bounded by byte budget with auto-trim - Rate limiting for search UI and admin login with exponential backoff - Search UI scoped-key rotation coordination ## Schema Migrations - 001_initial.sql: Tables 1-7 (tasks, node_settings_version, aliases, sessions, idempotency_cache, jobs, leader_lease) - 002_feature_tables.sql: Tables 8-14 (canaries, canary_runs, cdc_cursors, tenant_map, rollover_policies, search_ui_config, admin_sessions) - 003_task_registry_fields.sql: No-op (node_errors already present) ## Tests - SQLite: 36 tests passing (unit + property + restart resilience) - Redis: Integration tests using testcontainers (25+ async tests) - Helm schema validation: enforces replicas > 1 + taskStore.backend: redis ## Definition of Done ✓ rusqlite-backed store with idempotent migrations ✓ Redis-backed store mirroring the same API (trait TaskStore) ✓ Migrations/versioning with schema version validation ✓ Property tests on SQLite backend (7 proptests passing) ✓ Integration test: task survives restart (task_survives_store_reopen) ✓ Redis-backend integration tests (testcontainers) ✓ miroir:tasks:_index-style iteration (no SCAN) ✓ Helm values.schema.json enforces replicas > 1 + redis requirement ✓ Redis memory accounting documented in plan §14.7 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-02 16:52:25 -04:00
jedarden	04f1d47909	P3.3.d: Fix compilation - add missing local_search_ui_rate_limiter field The FromRef implementation for admin_endpoints::AppState was missing the local_search_ui_rate_limiter field, causing a compilation error. This completes P3.3.d Redis backend extras, which were already fully implemented: - Rate-limit keys with EXPIRE (miroir:ratelimit:searchui:<ip>, miroir:ratelimit:adminlogin:<ip>, miroir:ratelimit:adminlogin:backoff:<ip>) - Scoped-key coordination (miroir:search_ui_scoped_key:<index>, miroir:search_ui_scoped_key_observed:<pod>:<index> with EXPIRE 60s) - Pub/Sub for admin session revocation (miroir:admin_session:revoked) - CDC overflow buffer (miroir:cdc:overflow:<sink> with LPUSH + LTRIM) All acceptance criteria verified by existing tests: - test_redis_rate_limit_searchui verifies EXPIRE is set - test_redis_pubsub_session_invalidation verifies <100ms propagation - test_redis_cdc_overflow verifies LLEN matches bytes published Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 11:18:02 -04:00
jedarden	7f03fe6ce8	P12.OP6: expand arm64 deferral note with implementation roadmap Section 15 Open Problem #6 was a one-line placeholder. Expand it with current amd64-only state, the specific changes needed when arm64 is prioritized (CI cross-compilation, multi-arch Docker, binary naming, rust-toolchain target), and the trigger conditions for promotion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-21 07:06:11 -04:00
jedarden	26fe2970fc	P10.2: nodeMasterKey zero-downtime rotation flow Add `miroir-ctl key rotate-node-master` command implementing plan §9 4-step zero-downtime rotation: create new admin-scoped key on all Meilisearch nodes, print K8s Secret update instructions, wait for rolling restart confirmation, delete old key. Supports --dry-run, node auto-discovery via topology API, and rollback on step 1 failure. Add `address` field to topology API NodeInfo for CLI node discovery. Add runbooks for both nodeMasterKey (zero-downtime) and startup master key (maintenance window required) rotation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 15:49:40 -04:00
jedarden	c7be4ccbec	P12.OP4.1: Validate dfs_query_then_fetch benchmark (τ=0.9817) and document latency Re-ran the 10K-query score-comparability benchmark with fresh results: - DFS (global IDF preflight): avg τ = 0.9817, min τ = 0.9523, 0 queries below 0.95 → PASS - Score merge (local IDF): avg τ = 0.7938, 62.9% queries below 0.95 → FAIL - RRF merge: avg τ = 0.1361, 100% queries below 0.95 → CATASTROPHIC Added Criterion latency benchmarks to the research doc: - Global IDF aggregation: 285ns (3 shards) → 3.31µs (50 shards) - Query term extraction: 69ns (1 word) → 726ns (9 words) - IDF computation: ~113ps per term (trivial) - Coordinator-side overhead is sub-microsecond; dominant cost is network round-trip Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:31:13 -04:00
jedarden	096b43ccab	P12.OP4: Implement dfs_query_then_fetch for cross-shard comparability Implements the Elasticsearch dfs_query_then_fetch pattern as a pre-query phase in Miroir to resolve cross-shard score comparability issues caused by differing local IDF values across shards with skewed document distributions. Core changes: - scatter.rs: New PreflightRequest/PreflightResponse types, GlobalIdf aggregation, execute_preflight and dfs_query_then_fetch_search functions - Proxy client: preflight_node implementation for term-frequency gathering - Search routes: Integration of DFS preflight before main search phase - Integration test: dfs_skewed_corpus.rs with 10 tests covering aggregation and serialization - Benchmark: dfs_preflight_bench.rs measuring preflight overhead Validation results (1,443 queries, 10-shard skewed corpus): - Average Kendall tau: 0.9815 (95% CI: [0.9809, 0.9821]) - Min tau: 0.9523 (zero queries below 0.95 threshold) - Per-type: common-term +0.84, single-term +0.11, filtered +0.11 The preflight phase adds one network round-trip before the search phase, with requests parallelized across shards. Estimated overhead: +1-2 RTTs. Resolves bead miroir-yio: Global-IDF preflight implementation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 03:43:10 -04:00
jedarden	b201f0ff58	P12.OP4: Finalize score normalization validation — RRF τ=0.14, score τ=0.79 Research complete: both score-based and RRF merge fail 0.95 threshold. Updated research doc with full RRF validation results and confidence intervals. Added benchmark result reports and helper tests. Follow-up bead miroir-n6v created for global-IDF preflight (dfs_query_then_fetch pattern). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 02:40:54 -04:00
jedarden	9ce1b36206	P12.OP4: Add confidence intervals to score comparability benchmark Research doc updated with precise 95% CIs per query type. compare.py now computes and reports confidence intervals. Kendall τ = 0.79 (95% CI [0.7873, 0.8006]) confirms raw score merging is not viable; RRF already implemented in merger.rs as mitigation. Follow-up bead created (miroir-zfo) for RRF quality validation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 00:07:42 -04:00
jedarden	72f9a197b5	P12.OP4: Score normalization at scale — research & benchmark infrastructure Completed Plan §15 Open Problem #4 research on cross-shard score comparability. ## Key Finding Average Kendall tau: 0.79 vs. 0.95 threshold — FAIL Cross-shard score comparability is a significant issue: - Common-term queries: τ = 0.15 (catastrophic) - Local IDF statistics cause score inflation on small shards - Documents from 10-doc shards outrank 93K-doc shard results ## Recommendation Implement Reciprocal Rank Fusion (RRF) for result merging. Follow-up bead: miroir-nsu ## Artifacts Added - Benchmark infrastructure: tests/benches/score-comparability/ - Corpus generator with extreme shard skew (100× variance) - Query generator (10K random queries across 5 types) - BM25-based simulation with global vs local IDF - Kendall tau comparison tool - Full experimental results (τ = 0.79 ± 0.01, 95% CI) - Research writeup: docs/research/score-normalization-at-scale.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 23:58:08 -04:00
jedarden	c30d867d27	P0.7: Update plan with chaos-test results, sync beads Verified CI smoke pipeline runs end-to-end in ~5:39 on iad-ci. All three checks pass: fmt, clippy, test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 23:03:21 -04:00
jedarden	111a128278	P12.OP2: Update Raft vs Redis research with web survey findings Add rrqlite/openraft+SQLite reference project, correct raft-rs status to maintenance mode, note openraft 0.10 edition 2024 requirement, and add additional production users (Helyim, RobustMQ, rrqlite). Decision unchanged: do not ship Raft in v0.x or v1.0, revisit before v2.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:03:29 -04:00
jedarden	e47c1c2f73	P12.OP3: Validate 2× transient load caveat and add CLI schedule window guard - Add resharding load simulation model with real router hash functions - Benchmark confirms storage amplification is exactly 2.0× and dual-write amplification is exactly 2.0× across all test matrix scenarios (1KB/10GB, 10KB/100GB, 1MB/1TB), with hash distribution CV < 5% in all cases - CLI window guard: resharding.allowed_windows config restricts resharding to named time windows (e.g. "02:00-06:00 UTC"), CLI refuses outside windows without --force - Integration tests confirm rejection outside window, --force override, no-restriction mode, and disabled config handling Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:00:57 -04:00
jedarden	fec5aa5e74	P12.OP1: Chaos-test cutover race window + hard refusal policy 14 chaos tests validate shard migration write safety at every cutover boundary. Key findings: - AE on + delta pass: 0/1M loss (production default) - AE off + delta pass: 0/50K loss (delta pass is sufficient alone) - AE off + delta skipped: ~2% loss → hard refusal at config validation - 3-node cluster cutover: 0 loss with delta pass Hard-coded policy: MigrationCoordinator refuses migrations when both anti-entropy is disabled and delta pass is skipped. Warning logged when AE is disabled but delta pass remains active. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:00:21 -04:00
jedarden	81155beb0d	P12.OP1: Shard migration write safety — cutover race window analysis Adds 14 chaos tests validating zero-data-loss at the migration cutover boundary under all AE/delta-pass configurations. Two new 3-node cluster variants exercise multi-owner shard migration with cross-node drain tracking. Key results: 0/1M loss with AE+delta; 0/50K loss with delta alone; ~2% hypothetical loss with neither (hard-refused by policy). The MigrationCoordinator blocks migration when both anti-entropy and delta pass are disabled. Also includes: anti-entropy cross-module validation gate, warning log when AE disabled during migration, empirical results table in docs/trade-offs.md, and plan §15 OP#1 status update to verified. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 21:52:34 -04:00
jedarden	232092ffbb	P0.5: Implement Config struct mirroring plan §4/§13 YAML schema Full serde-derived struct tree covering every block in plan §4 (MiroirConfig, NodeConfig, TaskStoreConfig, AdminConfig, HealthConfig, ScatterConfig, RebalancerConfig, ServerConfig, ConnectionPoolConfig, TaskRegistryConfig) and all 21 §13 advanced-capability sub-structs (ReshardingConfig through SearchUiConfig with nested auth/rate-limit/CSP/analytics structs), plus §14 horizontal-scaling structs (PeerDiscoveryConfig, LeaderElectionConfig, HpaConfig). Includes: - Layered loading via config crate: built-in defaults → file → env overrides - Config::validate() with 14 cross-field rules (HA requires redis, scoped_key timing inversion, node group bounds, tenant affinity range checks, etc.) - 10 unit tests: round-trip YAML, full plan example, minimal YAML defaults, and validation rejection cases Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 21:46:12 -04:00
jedarden	188fd5404c	P12.OP5: Add dump import compatibility matrix Enumerates dump variants that streaming mode can/can't handle. - Added docs/dump-import/compatibility-matrix.md with comprehensive compatibility matrix covering Meilisearch versions, dump variants, and workarounds - Added docs/dump-import/README.md as entry point - Updated miroir-ctl dump command to reference matrix with helpful error messages for unimplemented subcommands (import, export, analyze) Addresses Open Problem #5: identifies what "can't reconstruct" means in concrete terms, giving operators clear guidance on when broadcast fallback is needed and what alternatives exist. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 21:06:46 -04:00
jedarden	fe274a5c0e	P12.OP2: Add Raft vs Redis task store HA research doc Survey openraft, raft-rs, and async-raft crates. Design a Raft-backed TaskStore prototype using openraft with SQLite state machine. Analytical benchmark against Redis across latency, throughput, memory, and ops complexity. Decision: revisit before v2.0, do not ship in v0.x/v1.0 — Raft fails the decision gate (worse on write latency and correctness maturity despite removing the Redis dependency). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 21:00:53 -04:00
jedarden	409f952f59	Add repo hygiene: LICENSE, CHANGELOG, .gitignore - LICENSE: MIT (per plan §12) - CHANGELOG.md: Keep a Changelog 1.1.0 skeleton with [Unreleased] and [0.1.0] sections matching the awk extractor from plan §7 - .gitignore: Rust target/, editor junk; Cargo.lock kept in VCS Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 20:47:36 -04:00

22 commits