No description

Find a file

jedarden d171dfb26a P12.OP4.1: Complete global-IDF preflight (dfs_query_then_fetch pattern) Implementation complete with validation passing all acceptance criteria: - Preflight phase: execute_preflight() gathers term frequencies from all shards - Global IDF aggregation: GlobalIdf::from_preflight_responses() computes corpus-wide statistics - DFS search: dfs_query_then_fetch_search() orchestrates the full pattern - Score merge: ScoreMergeStrategy merges by globally-comparable scores Benchmark validation (10K queries, 100K docs, 10 shards with skewed distribution): - Average Kendall tau: 0.9817 (PASS ≥ 0.95 threshold) - Min tau: 0.9523 (above threshold) - Queries with τ < 0.95: 0 (0%) - All query types pass (common, single, filtered, rare, multi-term) Latency overhead: +1-2 round trips (parallelized across shards), sub-microsecond coordinator-side aggregation per Criterion benchmarks. Closes miroir-n6v Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-04-19 07:56:22 -04:00
.beads	P12.OP4.1: Complete global-IDF preflight (dfs_query_then_fetch pattern)	2026-04-19 07:56:22 -04:00
.cargo	P1.5: Implement scatter module with covering-set construction + dispatch trait	2026-04-19 00:20:29 -04:00
benches	P12.OP4: Implement dfs_query_then_fetch for cross-shard comparability	2026-04-19 03:43:10 -04:00
charts/miroir	P3.5: Add values.schema.json constraint for replicas>1 requires Redis	2026-04-18 23:44:15 -04:00
crates	P2.4: Fix build and test for index lifecycle endpoints	2026-04-19 07:49:46 -04:00
docs	P12.OP4.1: Validate dfs_query_then_fetch benchmark (τ=0.9817) and document latency	2026-04-19 05:31:13 -04:00
tests/benches/score-comparability	P2.2: Implement write path with primary key validation, shard injection, and two-rule quorum	2026-04-19 06:48:30 -04:00
.editorconfig	Add repo hygiene: LICENSE, CHANGELOG, .gitignore	2026-04-18 20:47:36 -04:00
.gitignore	P12.OP4: Finalize score normalization validation — RRF τ=0.14, score τ=0.79	2026-04-19 02:40:54 -04:00
.needle-predispatch-sha	P12.OP4.1: Complete global-IDF preflight (dfs_query_then_fetch pattern)	2026-04-19 07:56:22 -04:00
Cargo.lock	P2.5: Implement task ID reconciliation and /tasks endpoints	2026-04-19 07:46:49 -04:00
Cargo.toml	P12.OP4: Implement dfs_query_then_fetch for cross-shard comparability	2026-04-19 03:43:10 -04:00
CHANGELOG.md	Add repo hygiene: LICENSE, CHANGELOG, .gitignore	2026-04-18 20:47:36 -04:00
clippy.toml	Add repo hygiene: LICENSE, CHANGELOG, .gitignore	2026-04-18 20:47:36 -04:00
LICENSE	Add repo hygiene: LICENSE, CHANGELOG, .gitignore	2026-04-18 20:47:36 -04:00
miroir.yaml	P2.1: Implement axum server skeleton with health/version/ready/topology/shards/metrics endpoints	2026-04-19 06:12:05 -04:00
README.md	Add repo hygiene: LICENSE, CHANGELOG, .gitignore	2026-04-18 20:47:36 -04:00
rust-toolchain.toml	Add repo hygiene: LICENSE, CHANGELOG, .gitignore	2026-04-18 20:47:36 -04:00
rustfmt.toml	Add repo hygiene: LICENSE, CHANGELOG, .gitignore	2026-04-18 20:47:36 -04:00

README.md

Miroir

Multi-node Index Replication Orchestrator, Integrated Rebalancing

Miroir is a RAID-like orchestration layer for Meilisearch. It stripes a large index across a fleet of small-RAM Meilisearch nodes with a configurable replication factor, fans out search queries across all shards, and rebalances shard assignments when nodes are added or removed — all using the Meilisearch Community Edition.

The Problem

Meilisearch loads its entire index into memory-mapped LMDB files. A large index that exceeds a single server's available RAM cannot run on that server. The Enterprise Edition's native sharding is gated behind a commercial license. Miroir solves this without it.

How It Works

Client
  │
  ▼
Miroir Orchestrator
  ├── Write path: hash(doc_id) → assign to shard → write to R replicas
  ├── Read path:  scatter query to all shards → gather → merge ranked results
  └── Rebalance: on node add/remove → recompute assignments → migrate minimum shards

Meilisearch Nodes (N instances, each holding a subset of shards)
  node-0   node-1   node-2   ...   node-N

Replication Factor

Analogous to software RAID — configurable per deployment:

RF	Redundancy	Node failures tolerated	Capacity
1	None (stripe only)	0	100% of fleet
2	One replica	1 per shard group	50% of fleet
3	Two replicas	2 per shard group	33% of fleet

Key Components

Orchestrator — proxy that handles shard routing, scatter-gather, result merging, and topology management
Shard router — consistent hash function (Rendezvous/HRW) mapping document IDs to node assignments; minimal reshuffling on topology change
Rebalancer — on node add/remove, recomputes assignments and migrates only the shards that changed owners; surviving replicas serve reads during rebuild
Result merger — normalizes and merges ranked result sets from multiple shards into a single coherent response

Status

Design phase. See docs/ for architecture detail.