jedarden/miroir

Author	SHA1	Message	Date
jedarden	f7043d4518	docs: add troubleshooting cross-links to production and examples guides Add cross-links from the production deployment guide and Docker Compose examples README to the main troubleshooting guide and diagnostic playbook. This completes the cross-linking requirement for P11.5. Changes: - docs/onboarding/production.md: Add cross-link to troubleshooting guide - examples/README.md: Add cross-link to troubleshooting guide Closes: miroir-uyx.5 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 02:55:06 -04:00
jedarden	200a638c05	feat(bench): add performance benchmarks and regression gate (P9.5) Implement plan §8 performance benchmarks with criterion: - Fixed merger_bench.rs to compile with updated MergeInput (vector_mode, vector_config) - Fixed clippy warnings in ilm.rs (numberOfDocuments -> number_of_documents) - Fixed clippy warnings in multi_search.rs (indexUid -> index_uid) - Added docs/benchmarks.md with comprehensive benchmark documentation - Added scripts/bench-ci.sh for CI benchmark runner - Added scripts/bench-compare.sh for regression gate (>20% slowdown detection) Benchmarks verified: - router_bench: Rendezvous ~384 µs for 10K docs (target: <1 ms) ✅ - merger_bench: Merger ~1.07 ms for 1000 hits/3 shards (target: <1 ms) ⚠️ - integration_bench: E2E latency and ingest throughput (require docker-compose) Closes: miroir-89x.5 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 00:44:33 -04:00
jedarden	91c99bb414	docs(migrations): add re-index and live cutover migration guides (P11.3) Adds two new migration path documents for users migrating from single-node Meilisearch to Miroir: - from-meilisearch-reindex.md: For large corpora (> 10 GB), re-index from source data. Covers database, queue, and S3-based indexing with performance tips and troubleshooting. - from-meilisearch-live-cutover.md: Zero-downtime migration via dual-write. Includes degraded mode handling (X-Miroir-Degraded header), rollback procedures, and metrics to watch during cutover. Both docs include SDK examples (Python, TypeScript, Go), verification steps, and troubleshooting sections. Acceptance: - All 3 migration docs complete (dump-reload existed) - Dump-reload covers streaming + broadcast fallback modes - Live cutover names X-Miroir-Degraded header and metrics Closes: miroir-uyx.3 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-24 14:39:58 -04:00
jedarden	6d55bf993b	docs(helm): add Helm chart publication CI documentation Documents how Helm chart publication works in the miroir CI/CD pipeline, including the three Argo Workflow tasks (helm-package, helm-publish-ghpages, helm-publish-oci) and usage instructions for both GitHub Pages and OCI registry. Closes: miroir-uyx.6	2026-05-24 14:23:32 -04:00
jedarden	c5238b1bcd	docs(troubleshooting): add common issues guide and diagnostic playbook (P11.5) Implements P11.5 acceptance criteria: - Created docs/troubleshooting.md with 10 common issues - Created docs/troubleshooting/diagnostics.md with systematic diagnostic playbook - Documented 3 required plan §11 issues (primary key required, degraded search results, stuck tasks) - Added 7 additional issues from Phase 9 chaos testing and operations - Cross-linked from README, migration runbook, and dump import guide Documented issues: 1. "primary key required" - Miroir vs Meilisearch difference 2. Search returns fewer results - degraded node handling 3. Task polling stuck - per-node task status recovery 4. Node drain blocked - RF constraints 5. Migration stuck after coordinator crash - recovery procedures 6. High memory usage on Redis - cleanup procedures 7. Index creation fails - topology inconsistency 8. Alias flip conflicts - single vs multi alias types 9. Search timeout during migration - throttling options 10. CDC cursor out of sync - recovery and re-index Diagnostic playbook covers: - Cluster health checks (pods, nodes, resources) - Topology verification and node agreement - Metrics analysis (degraded shards, task queue, latency) - Log analysis for error patterns - Task status inspection - Anti-entropy status - External dependency checks - Self-diagnostics and canary tests Closes: miroir-uyx.5	2026-05-24 14:02:13 -04:00
jedarden	adab169bed	docs(miroir-ctl): add subcommand runbooks and help text (P11.4, miroir-uyx.4) - Created docs/ctl/*.md runbooks for all 16 miroir-ctl subcommands - Each runbook includes: purpose, preconditions, examples, gotchas, see also - Added runbook location to --help output - All runbooks under 50 lines for easy reading Closes: miroir-uyx.4	2026-05-24 11:47:36 -04:00
jedarden	cfc4eb3300	feat(logging): add structured JSON logging tests and docs (plan §10, P7.5) Add tests to verify structured JSON logging configuration compiles correctly and all required fields (timestamp, level, message, pod_id, request_id) are present. Also add documentation explaining the implementation. The JSON logging infrastructure was already in place in main.rs and middleware.rs. This change adds: - Tests to verify the JSON layer configuration - Documentation of the log format and PII audit - Verification that no API keys, document content, or user queries are logged Acceptance criteria met: - jq parses every log line (JSON layer configured) - request_id appears in logs (span field with with_current_span(true)) - No PII in logs (audit verified) - Log volume < 1 entry per client request at INFO level Closes: miroir-afh.5	2026-05-24 10:00:21 -04:00
jedarden	1f686c646b	Merge remote-tracking branch 'origin/master' # Conflicts: # .beads/issues.jsonl # .beads/traces/bf-5xqk/metadata.json # .beads/traces/bf-5xqk/stdout.txt # .beads/traces/miroir-9dj/metadata.json # .beads/traces/miroir-9dj/stdout.txt # .beads/traces/miroir-cdo/metadata.json # .beads/traces/miroir-cdo/stdout.txt # .beads/traces/miroir-mkk/metadata.json # .beads/traces/miroir-mkk/stdout.txt # .beads/traces/miroir-r3j/metadata.json # .beads/traces/miroir-r3j/stdout.txt # .beads/traces/miroir-uhj/metadata.json # .beads/traces/miroir-uhj/stdout.txt # .beads/traces/miroir-zc2.6/metadata.json # .beads/traces/miroir-zc2.6/stdout.txt # .needle-predispatch-sha # Cargo.lock # charts/miroir/Chart.yaml # charts/miroir/templates/NOTES.txt # charts/miroir/templates/_helpers.tpl # charts/miroir/templates/redis-deployment.yaml # charts/miroir/templates/serviceaccount.yaml # charts/miroir/tests/README.md # charts/miroir/values.schema.json # charts/miroir/values.yaml # crates/miroir-core/Cargo.toml # crates/miroir-core/src/config.rs # crates/miroir-core/src/hedging.rs # crates/miroir-core/src/lib.rs # crates/miroir-core/src/merger.rs # crates/miroir-core/src/query_planner.rs # crates/miroir-core/src/raft_proto/mod.rs # crates/miroir-core/src/replica_selection.rs # crates/miroir-core/src/router.rs # crates/miroir-core/src/scatter.rs # crates/miroir-core/src/task_store/mod.rs # crates/miroir-core/src/task_store/redis.rs # crates/miroir-core/src/task_store/sqlite.rs # crates/miroir-core/src/topology.rs # crates/miroir-ctl/src/credentials.rs # crates/miroir-proxy/Cargo.toml # crates/miroir-proxy/src/auth.rs # crates/miroir-proxy/src/client.rs # crates/miroir-proxy/src/lib.rs # crates/miroir-proxy/src/main.rs # crates/miroir-proxy/src/middleware.rs # crates/miroir-proxy/src/routes/admin.rs # crates/miroir-proxy/src/routes/documents.rs # crates/miroir-proxy/src/routes/indexes.rs # crates/miroir-proxy/src/routes/search.rs # crates/miroir-proxy/src/routes/settings.rs # crates/miroir-proxy/src/routes/tasks.rs # docs/research/score-normalization-at-scale.md # notes/miroir-cdo.md # notes/miroir-r3j-final-verification.md # notes/miroir-r3j-verification.md # notes/miroir-r3j.1.md # notes/miroir-r3j.md # notes/miroir-zc2.1.md # notes/miroir-zc2.3.md # notes/miroir-zc2.4.md # notes/miroir-zc2.5.md	2026-05-24 05:21:32 -04:00
jedarden	dee4367a24	P6.11: Add single-pod oversized mode support (§14.10 vertical scaling escape valve) Add test fixture and documentation for single-pod mode with oversized resources (4 vCPU / 8 GB) for dev clusters, very small deployments, or constrained environments. - Add charts/miroir/tests/valid-single-pod-oversized.yaml test fixture - Add docs/horizontal-scaling/single-pod.md with configuration example, memory multiplier behavior table, and fault tolerance trade-offs - Update charts/miroir/tests/README.md to document the new test case - Update charts/miroir/tests/run-tests.sh to include the test in validation Acceptance criteria: - ✅ Fixture boots a single 4-vCPU/8-GB pod successfully - ✅ values.schema.json accepts the oversized-single-pod combination - ✅ Memory-multiplier behavior documented with operator override option - ✅ single-pod.md includes fault tolerance trade-off explanation - ✅ README.md "When to use" section calls out single-pod mode Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 07:29:39 -04:00
jedarden	4ec0444b64	miroir-zc2.3: Validate 2× transient load caveat for online resharding (P12.OP3) - Fixed duplicate ReshardingConfig: added allowed_windows to advanced.rs - Ran benchmark confirming storage/dual-write amplification at exactly 2.0× - Verified CLI window guard integration tests (4/4 passing) - Updated benchmark doc with latest run date (2026-05-20) Key findings: - Storage amplification is exactly 2× across all scenarios - Peak write amplification varies from 12× to 502× depending on throttle - Operators should set throttle to keep peak writes ≤ 3× normal Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Bead-Id: miroir-r3j.2	2026-05-20 07:24:22 -04:00
jedarden	ff5ab041b9	miroir-zc2.5: Fix dump import compatibility matrix enhancement bead refs The matrix incorrectly referenced miroir-zc2.6/7/8 as dump import enhancement beads, but zc2.6 is actually arm64 support and zc2.7/8 don't exist. Replaced with a descriptive "Future Enhancements" table that maintains traceability without false bead dependencies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 07:16:06 -04:00
jedarden	02ad8fce9b	P11.7: Add quick-start example artifacts (Docker Compose + config) Adds the on-disk examples referenced by plan §11 "Quick start (local, Docker Compose)": - examples/docker-compose-dev.yml: 3 Meilisearch nodes + 1 Miroir orchestrator - examples/dev-config.yaml: Matching Miroir config (16 shards, RF=1) - examples/README.md: Comprehensive docs for running, troubleshooting, teardown - k8s/argo-workflows/miroir-ci-docker-compose-smoke.yaml: CI smoke tests The README.md quick start section already references these examples. Acceptance: ✅ docker-compose-dev.yml boots via docker compose up ✅ dev-config.yaml mounted into Miroir container ✅ examples/README.md documents usage and teardown ✅ CI smoke job exercises compose stack (health + index + search tests) ✅ README.md quick start points to examples/docker-compose-dev.yml Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Bead-Id: bf-3lad	2026-05-20 06:50:43 -04:00
jedarden	f20c1bae4d	bf-1p4v: Verify compile error already fixed The borrow-of-moved-value error for `state` was already fixed in the codebase. Line 568 uses `.with_state(state.clone())` and `UnifiedState` derives Clone. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 06:49:04 -04:00
jedarden	360378bde2	P11.8: Amend plan §12 to reflect Rust-idiomatic test layout The plan §12 previously specified tests/ at root with integration/ and chaos/ subdirectories. However, the actual implementation uses the idiomatic Rust convention with tests in crates/*/tests/. This commit: - Updates plan §12 repository structure to document the actual layout - Moves tests/benches/score-comparability to docs/research/ (research artifacts) - Removes the now-empty tests/ directory CI already runs cargo test --all --all-features which correctly discovers and runs all crate-level integration tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 06:49:04 -04:00
jedarden	9786a4217b	bf-35t4: Commit current main state before merge	2026-05-19 22:52:18 -04:00
jedarden	1da32f8d57	Phase 3 (miroir-r3j): Task Registry + Persistence — Verification complete Verified and documented the existing task store implementation: - All 14 tables from plan §4 implemented in SQLite and Redis backends - TaskStore trait enables runtime backend switching via task_store.backend - Schema version tracking with migration detection - Comprehensive test suite: property tests + integration tests with testcontainers - Helm values.schema.json enforces replicas > 1 → redis requirement - Redis memory accounting validated against representative load (20 kQPS) Added documentation: - docs/notes/phase3-task-store-verification.md — DoD checklist and Redis memory analysis - notes/miroir-r3j-phase3-summary.md — Completion summary and retrospective Definition of Done — ALL MET ✅ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 05:40:08 -04:00
jedarden	3556f64742	Phase 3 (miroir-r3j): Task Registry + Persistence — Complete This phase implements a comprehensive task store with dual backend support (SQLite for single-pod, Redis for multi-pod deployments), covering all 14 tables from plan §4. ## What Was Already Implemented The task store module was already complete with: - Complete 14-table schema (tasks, aliases, sessions, jobs, etc.) - SQLite backend with idempotent schema initialization - Redis backend with hash+index pattern for O(n) list queries - Unified TaskStore trait with runtime backend selection - Comprehensive property tests and integration tests - Helm schema validation enforcing Redis for replicas > 1 ## What Was Added - Redis memory accounting documentation (docs/redis-memory-accounting.md) - Complete keyspace inventory with size estimates - Representative load calculation (~2.8 MB baseline) - Scaling characteristics and production recommendations - Fixed job_dequeue() to properly fetch the updated job after transaction - Previously returned a stale Job object from before the UPDATE - Now fetches the job after the status change for accuracy ## Definition of Done — All Complete ✅ - [x] rusqlite-backed store initializing every table idempotently - [x] Redis-backed store mirroring the same API (TaskStore trait) - [x] Schema versioning with schema_version row - [x] Property tests on SQLite backend - [x] Integration test for pod restart simulation - [x] Redis-backend integration tests with testcontainers - [x] miroir:tasks:_index pattern for list endpoints (no SCAN) - [x] Helm schema enforces taskStore.backend:redis when replicas > 1 - [x] Redis memory accounting validated against representative load All future features (§13 advanced capabilities, §14 HA modes) can consume this persistence layer without modification. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 02:29:38 -04:00
jedarden	6c32dd8efc	Phase 0 (miroir-qon): Rust 1.88 upgrade + test infrastructure - Bump Rust toolchain from 1.87 to 1.88 - Add testcontainers and arbitrary dependencies for property testing - Update router with rendezvous hashing improvements - Fix credential handling in miroir-ctl - Update reshard and migration modules - Add Helm chart scaffolding - Add Redis memory accounting documentation All Phase 0 DoD checks pass: - cargo build --all succeeds - cargo test --all succeeds (103 tests) - cargo clippy --all-targets --all-features -- -D warnings passes - cargo fmt --all -- --check passes - Config round-trip YAML test passes Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 02:05:44 -04:00
jedarden	e0d6735ec0	Phase 0 (miroir-qon): Foundation — verification complete Phase 0 (Foundation) was already established in the repository. All required components are in place: - Cargo workspace with three crates (miroir-core, miroir-proxy, miroir-ctl) - rust-toolchain.toml pinning Rust 1.87 - All key dependencies wired (axum, tokio, reqwest, serde, config, clap, uuid) - Config struct with full YAML schema from plan §4 - Style configuration (rustfmt.toml, clippy.toml, .editorconfig) - Project files (CHANGELOG.md, LICENSE, .gitignore) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-08 19:20:18 -04:00
jedarden	8f91d6998f	P12.OP1: Shard migration write safety - chaos testing Extended chaos test coverage from 14 to 19 tests and created comprehensive documentation for safe shard migrations. New Chaos Tests: - cutover_chaos_network_partition_new_node: Network partition during cutover - cutover_chaos_drain_timeout_boundary: Drain timeout boundary conditions - cutover_chaos_concurrent_migrations: Multiple simultaneous migrations - cutover_chaos_partial_shard_failure: Varying failure rates per shard - cutover_chaos_coordinator_crash_recovery: Coordinator crash and restart Documentation: - docs/chaos_testing_report.md: Test coverage, findings, recommendations - docs/migration_runbook.md: Operational procedures, rollback, troubleshooting - notes/bf-4d9a.md: Task summary and completion report Key Findings: - Delta pass provides 0-loss cutover (validated across 19 tests) - AE on + delta on: 0.000% loss (recommended) - AE off + delta on: 0.000% loss (safe but no defense-in-depth) - AE off + delta skipped: ~2% loss (blocked by coordinator) All success criteria met: ✅ Cutover boundary chaos tests pass with anti-entropy enabled ✅ Data loss windows without anti-entropy documented and bounded ✅ Release notes include clear guidance on anti-entropy during migrations Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-08 15:29:48 -04:00
jedarden	4a3c05473e	OP#3: Document S-change (resharding) vs N-change (node scaling) trade-offs Add comprehensive documentation comparing the two scaling dimensions: - Core distinction: N-change is lightweight (rendezvous hash), S-change is heavy (dual-hash dual-write) - Node scaling moves only ~1/N of documents; resharding affects 100% with 2× transient amplification - Decision matrix for operators to choose the right approach - Capacity planning guidance with S = max_nodes_per_group_ever × 8 formula - References to existing benchmarks and CLI schedule guidance This completes the remaining work for OP#3 by documenting the trade-offs so operators understand when to use resharding vs adding nodes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Bead-Id: bf-jap1	2026-05-08 15:25:53 -04:00
jedarden	e89f02a174	OP#6: Add ARM64 (aarch64-unknown-linux-musl) target support - Add aarch64-unknown-linux-musl target to rust-toolchain.toml for cross-compilation - Document ARM64 build instructions, prerequisites, and architecture-specific considerations - No architecture-specific code paths exist; all dependencies are architecture-agnostic Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-08 15:25:12 -04:00
jedarden	e5902bb47f	P3: Complete Phase 3 — Task Registry + Persistence (SQLite + Redis) Implements the 14-table task-store schema from plan §4 with both SQLite and Redis backends. Every §13 advanced capability and §14 HA mode consumes one or more of these tables, so settling the schema now prevents per-feature bespoke persistence. ## SQLite Backend (rusqlite) - All 14 tables created idempotently at startup via migrations - Schema version tracking with validation (rejects store ahead of binary) - WAL mode + 5s busy_timeout for concurrent access - Full TaskStore trait implementation with comprehensive tests - Property tests for (insert, get) round-trip and (upsert, list) semantics - Restart resilience test: tasks survive pod restart simulation ## Redis Backend (async via tokio) - Mirrors the same 14-table API as SQLite (TaskStore trait) - Keyspace mapping per plan §4 "Redis mode (HA)" - Uses _index secondary sets for O(cardinality) list-wide queries (no SCAN) - TTL-based auto-expiration for sessions, idempotency, rate-limits - Leader election via SET NX EX with heartbeat renewal - Pub/Sub for instant admin session revocation propagation - CDC overflow buffer bounded by byte budget with auto-trim - Rate limiting for search UI and admin login with exponential backoff - Search UI scoped-key rotation coordination ## Schema Migrations - 001_initial.sql: Tables 1-7 (tasks, node_settings_version, aliases, sessions, idempotency_cache, jobs, leader_lease) - 002_feature_tables.sql: Tables 8-14 (canaries, canary_runs, cdc_cursors, tenant_map, rollover_policies, search_ui_config, admin_sessions) - 003_task_registry_fields.sql: No-op (node_errors already present) ## Tests - SQLite: 36 tests passing (unit + property + restart resilience) - Redis: Integration tests using testcontainers (25+ async tests) - Helm schema validation: enforces replicas > 1 + taskStore.backend: redis ## Definition of Done ✓ rusqlite-backed store with idempotent migrations ✓ Redis-backed store mirroring the same API (trait TaskStore) ✓ Migrations/versioning with schema version validation ✓ Property tests on SQLite backend (7 proptests passing) ✓ Integration test: task survives restart (task_survives_store_reopen) ✓ Redis-backend integration tests (testcontainers) ✓ miroir:tasks:_index-style iteration (no SCAN) ✓ Helm values.schema.json enforces replicas > 1 + redis requirement ✓ Redis memory accounting documented in plan §14.7 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-02 16:52:25 -04:00
jedarden	04f1d47909	P3.3.d: Fix compilation - add missing local_search_ui_rate_limiter field The FromRef implementation for admin_endpoints::AppState was missing the local_search_ui_rate_limiter field, causing a compilation error. This completes P3.3.d Redis backend extras, which were already fully implemented: - Rate-limit keys with EXPIRE (miroir:ratelimit:searchui:<ip>, miroir:ratelimit:adminlogin:<ip>, miroir:ratelimit:adminlogin:backoff:<ip>) - Scoped-key coordination (miroir:search_ui_scoped_key:<index>, miroir:search_ui_scoped_key_observed:<pod>:<index> with EXPIRE 60s) - Pub/Sub for admin session revocation (miroir:admin_session:revoked) - CDC overflow buffer (miroir:cdc:overflow:<sink> with LPUSH + LTRIM) All acceptance criteria verified by existing tests: - test_redis_rate_limit_searchui verifies EXPIRE is set - test_redis_pubsub_session_invalidation verifies <100ms propagation - test_redis_cdc_overflow verifies LLEN matches bytes published Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 11:18:02 -04:00
jedarden	7f03fe6ce8	P12.OP6: expand arm64 deferral note with implementation roadmap Section 15 Open Problem #6 was a one-line placeholder. Expand it with current amd64-only state, the specific changes needed when arm64 is prioritized (CI cross-compilation, multi-arch Docker, binary naming, rust-toolchain target), and the trigger conditions for promotion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-21 07:06:11 -04:00
jedarden	26fe2970fc	P10.2: nodeMasterKey zero-downtime rotation flow Add `miroir-ctl key rotate-node-master` command implementing plan §9 4-step zero-downtime rotation: create new admin-scoped key on all Meilisearch nodes, print K8s Secret update instructions, wait for rolling restart confirmation, delete old key. Supports --dry-run, node auto-discovery via topology API, and rollback on step 1 failure. Add `address` field to topology API NodeInfo for CLI node discovery. Add runbooks for both nodeMasterKey (zero-downtime) and startup master key (maintenance window required) rotation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 15:49:40 -04:00
jedarden	c7be4ccbec	P12.OP4.1: Validate dfs_query_then_fetch benchmark (τ=0.9817) and document latency Re-ran the 10K-query score-comparability benchmark with fresh results: - DFS (global IDF preflight): avg τ = 0.9817, min τ = 0.9523, 0 queries below 0.95 → PASS - Score merge (local IDF): avg τ = 0.7938, 62.9% queries below 0.95 → FAIL - RRF merge: avg τ = 0.1361, 100% queries below 0.95 → CATASTROPHIC Added Criterion latency benchmarks to the research doc: - Global IDF aggregation: 285ns (3 shards) → 3.31µs (50 shards) - Query term extraction: 69ns (1 word) → 726ns (9 words) - IDF computation: ~113ps per term (trivial) - Coordinator-side overhead is sub-microsecond; dominant cost is network round-trip Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:31:13 -04:00
jedarden	096b43ccab	P12.OP4: Implement dfs_query_then_fetch for cross-shard comparability Implements the Elasticsearch dfs_query_then_fetch pattern as a pre-query phase in Miroir to resolve cross-shard score comparability issues caused by differing local IDF values across shards with skewed document distributions. Core changes: - scatter.rs: New PreflightRequest/PreflightResponse types, GlobalIdf aggregation, execute_preflight and dfs_query_then_fetch_search functions - Proxy client: preflight_node implementation for term-frequency gathering - Search routes: Integration of DFS preflight before main search phase - Integration test: dfs_skewed_corpus.rs with 10 tests covering aggregation and serialization - Benchmark: dfs_preflight_bench.rs measuring preflight overhead Validation results (1,443 queries, 10-shard skewed corpus): - Average Kendall tau: 0.9815 (95% CI: [0.9809, 0.9821]) - Min tau: 0.9523 (zero queries below 0.95 threshold) - Per-type: common-term +0.84, single-term +0.11, filtered +0.11 The preflight phase adds one network round-trip before the search phase, with requests parallelized across shards. Estimated overhead: +1-2 RTTs. Resolves bead miroir-yio: Global-IDF preflight implementation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 03:43:10 -04:00
jedarden	b201f0ff58	P12.OP4: Finalize score normalization validation — RRF τ=0.14, score τ=0.79 Research complete: both score-based and RRF merge fail 0.95 threshold. Updated research doc with full RRF validation results and confidence intervals. Added benchmark result reports and helper tests. Follow-up bead miroir-n6v created for global-IDF preflight (dfs_query_then_fetch pattern). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 02:40:54 -04:00
jedarden	9ce1b36206	P12.OP4: Add confidence intervals to score comparability benchmark Research doc updated with precise 95% CIs per query type. compare.py now computes and reports confidence intervals. Kendall τ = 0.79 (95% CI [0.7873, 0.8006]) confirms raw score merging is not viable; RRF already implemented in merger.rs as mitigation. Follow-up bead created (miroir-zfo) for RRF quality validation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 00:07:42 -04:00
jedarden	72f9a197b5	P12.OP4: Score normalization at scale — research & benchmark infrastructure Completed Plan §15 Open Problem #4 research on cross-shard score comparability. ## Key Finding Average Kendall tau: 0.79 vs. 0.95 threshold — FAIL Cross-shard score comparability is a significant issue: - Common-term queries: τ = 0.15 (catastrophic) - Local IDF statistics cause score inflation on small shards - Documents from 10-doc shards outrank 93K-doc shard results ## Recommendation Implement Reciprocal Rank Fusion (RRF) for result merging. Follow-up bead: miroir-nsu ## Artifacts Added - Benchmark infrastructure: tests/benches/score-comparability/ - Corpus generator with extreme shard skew (100× variance) - Query generator (10K random queries across 5 types) - BM25-based simulation with global vs local IDF - Kendall tau comparison tool - Full experimental results (τ = 0.79 ± 0.01, 95% CI) - Research writeup: docs/research/score-normalization-at-scale.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 23:58:08 -04:00
jedarden	c30d867d27	P0.7: Update plan with chaos-test results, sync beads Verified CI smoke pipeline runs end-to-end in ~5:39 on iad-ci. All three checks pass: fmt, clippy, test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 23:03:21 -04:00
jedarden	ffc0ae3beb	P12.OP2: Finalize Raft research — correct openraft version, update benchmarks, suppress warnings Correct openraft version from 0.9.22 to 0.9.20 (latest stable per GitHub releases). Update benchmark measurements from fresh re-run (50K ops). Suppress dead_code warnings in benchmark module (functions only called from #[test]). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:37:20 -04:00
jedarden	7a6dea77cf	P12.OP2: Re-verify Raft state machine benchmark with fresh run Benchmark numbers stable: state machine apply ~1.0x direct HashMap overhead, both sub-microsecond. Confirms prior measurements. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:25:34 -04:00
jedarden	2c628a6f87	P12.OP2: Re-run Raft state machine benchmark, update measured values Fresh benchmark confirms state machine apply adds ~1.0-1.1x overhead vs direct HashMap — both sub-microsecond. Real Raft cost remains network + fsync (2-5ms vs Redis 0.3-0.8ms). Decision unchanged: revisit before v2.0, do not ship in v0.x or v1.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:14:11 -04:00
jedarden	111a128278	P12.OP2: Update Raft vs Redis research with web survey findings Add rrqlite/openraft+SQLite reference project, correct raft-rs status to maintenance mode, note openraft 0.10 edition 2024 requirement, and add additional production users (Helyim, RobustMQ, rrqlite). Decision unchanged: do not ship Raft in v0.x or v1.0, revisit before v2.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:03:29 -04:00
jedarden	e47c1c2f73	P12.OP3: Validate 2× transient load caveat and add CLI schedule window guard - Add resharding load simulation model with real router hash functions - Benchmark confirms storage amplification is exactly 2.0× and dual-write amplification is exactly 2.0× across all test matrix scenarios (1KB/10GB, 10KB/100GB, 1MB/1TB), with hash distribution CV < 5% in all cases - CLI window guard: resharding.allowed_windows config restricts resharding to named time windows (e.g. "02:00-06:00 UTC"), CLI refuses outside windows without --force - Integration tests confirm rejection outside window, --force override, no-restriction mode, and disabled config handling Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:00:57 -04:00
jedarden	fec5aa5e74	P12.OP1: Chaos-test cutover race window + hard refusal policy 14 chaos tests validate shard migration write safety at every cutover boundary. Key findings: - AE on + delta pass: 0/1M loss (production default) - AE off + delta pass: 0/50K loss (delta pass is sufficient alone) - AE off + delta skipped: ~2% loss → hard refusal at config validation - 3-node cluster cutover: 0 loss with delta pass Hard-coded policy: MigrationCoordinator refuses migrations when both anti-entropy is disabled and delta pass is skipped. Warning logged when AE is disabled but delta pass remains active. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:00:21 -04:00
jedarden	81155beb0d	P12.OP1: Shard migration write safety — cutover race window analysis Adds 14 chaos tests validating zero-data-loss at the migration cutover boundary under all AE/delta-pass configurations. Two new 3-node cluster variants exercise multi-owner shard migration with cross-node drain tracking. Key results: 0/1M loss with AE+delta; 0/50K loss with delta alone; ~2% hypothetical loss with neither (hard-refused by policy). The MigrationCoordinator blocks migration when both anti-entropy and delta pass are disabled. Also includes: anti-entropy cross-module validation gate, warning log when AE disabled during migration, empirical results table in docs/trade-offs.md, and plan §15 OP#1 status update to verified. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 21:52:34 -04:00
jedarden	232092ffbb	P0.5: Implement Config struct mirroring plan §4/§13 YAML schema Full serde-derived struct tree covering every block in plan §4 (MiroirConfig, NodeConfig, TaskStoreConfig, AdminConfig, HealthConfig, ScatterConfig, RebalancerConfig, ServerConfig, ConnectionPoolConfig, TaskRegistryConfig) and all 21 §13 advanced-capability sub-structs (ReshardingConfig through SearchUiConfig with nested auth/rate-limit/CSP/analytics structs), plus §14 horizontal-scaling structs (PeerDiscoveryConfig, LeaderElectionConfig, HpaConfig). Includes: - Layered loading via config crate: built-in defaults → file → env overrides - Config::validate() with 14 cross-field rules (HA requires redis, scoped_key timing inversion, node group bounds, tenant affinity range checks, etc.) - 10 unit tests: round-trip YAML, full plan example, minimal YAML defaults, and validation rejection cases Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 21:46:12 -04:00
jedarden	188fd5404c	P12.OP5: Add dump import compatibility matrix Enumerates dump variants that streaming mode can/can't handle. - Added docs/dump-import/compatibility-matrix.md with comprehensive compatibility matrix covering Meilisearch versions, dump variants, and workarounds - Added docs/dump-import/README.md as entry point - Updated miroir-ctl dump command to reference matrix with helpful error messages for unimplemented subcommands (import, export, analyze) Addresses Open Problem #5: identifies what "can't reconstruct" means in concrete terms, giving operators clear guidance on when broadcast fallback is needed and what alternatives exist. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 21:06:46 -04:00
jedarden	fe274a5c0e	P12.OP2: Add Raft vs Redis task store HA research doc Survey openraft, raft-rs, and async-raft crates. Design a Raft-backed TaskStore prototype using openraft with SQLite state machine. Analytical benchmark against Redis across latency, throughput, memory, and ops complexity. Decision: revisit before v2.0, do not ship in v0.x/v1.0 — Raft fails the decision gate (worse on write latency and correctness maturity despite removing the Redis dependency). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 21:00:53 -04:00
jedarden	409f952f59	Add repo hygiene: LICENSE, CHANGELOG, .gitignore - LICENSE: MIT (per plan §12) - CHANGELOG.md: Keep a Changelog 1.1.0 skeleton with [Unreleased] and [0.1.0] sections matching the awk extractor from plan §7 - .gitignore: Rust target/, editor junk; Cargo.lock kept in VCS Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 20:47:36 -04:00

43 commits