No description
Find a file
jedarden ebc300355c P2.3: Implement scatter-gather search with group fallback
Implement the search read path with scatter-gather + merge + group selection:

1. Group-unavailability fallback: When a shard has no available replica
   in the primary group, the Fallback policy tries other replica groups
   before failing. This provides full results (not degraded) when an
   alternate group is healthy.

2. X-Miroir-Degraded header: Now includes actual shard IDs in the format
   "X-Miroir-Degraded: shards=3,7,11" instead of just "partial".

3. Acceptance tests for P2.3:
   - Unique-keyword search deduplicates correctly (RRF)
   - Facet counts sum across shards
   - Paging with no dupes/gaps
   - Node down with RF=2 still covers all shards
   - Group down falls back to other group (not degraded)
   - Degraded header includes actual shard IDs

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 06:40:04 -04:00
.beads Fix clippy warnings, improve test robustness, and clean up proxy code 2026-04-19 04:53:45 -04:00
.cargo P1.5: Implement scatter module with covering-set construction + dispatch trait 2026-04-19 00:20:29 -04:00
benches P12.OP4: Implement dfs_query_then_fetch for cross-shard comparability 2026-04-19 03:43:10 -04:00
charts/miroir P3.5: Add values.schema.json constraint for replicas>1 requires Redis 2026-04-18 23:44:15 -04:00
crates P2.3: Implement scatter-gather search with group fallback 2026-04-19 06:40:04 -04:00
docs P12.OP4.1: Validate dfs_query_then_fetch benchmark (τ=0.9817) and document latency 2026-04-19 05:31:13 -04:00
tests/benches/score-comparability P12.OP4: Validate RRF merge quality — τ=0.14 confirms DFS preflight is required 2026-04-19 05:43:42 -04:00
.editorconfig Add repo hygiene: LICENSE, CHANGELOG, .gitignore 2026-04-18 20:47:36 -04:00
.gitignore P12.OP4: Finalize score normalization validation — RRF τ=0.14, score τ=0.79 2026-04-19 02:40:54 -04:00
.needle-predispatch-sha Fix clippy warnings, improve test robustness, and clean up proxy code 2026-04-19 04:53:45 -04:00
Cargo.lock Integrate MeilisearchError into proxy (IntoResponse, auth middleware) + telemetry 2026-04-19 05:21:09 -04:00
Cargo.toml P12.OP4: Implement dfs_query_then_fetch for cross-shard comparability 2026-04-19 03:43:10 -04:00
CHANGELOG.md Add repo hygiene: LICENSE, CHANGELOG, .gitignore 2026-04-18 20:47:36 -04:00
clippy.toml Add repo hygiene: LICENSE, CHANGELOG, .gitignore 2026-04-18 20:47:36 -04:00
LICENSE Add repo hygiene: LICENSE, CHANGELOG, .gitignore 2026-04-18 20:47:36 -04:00
miroir.yaml P2.1: Implement axum server skeleton with health/version/ready/topology/shards/metrics endpoints 2026-04-19 06:12:05 -04:00
README.md Add repo hygiene: LICENSE, CHANGELOG, .gitignore 2026-04-18 20:47:36 -04:00
rust-toolchain.toml Add repo hygiene: LICENSE, CHANGELOG, .gitignore 2026-04-18 20:47:36 -04:00
rustfmt.toml Add repo hygiene: LICENSE, CHANGELOG, .gitignore 2026-04-18 20:47:36 -04:00

Miroir

Multi-node Index Replication Orchestrator, Integrated Rebalancing

Miroir is a RAID-like orchestration layer for Meilisearch. It stripes a large index across a fleet of small-RAM Meilisearch nodes with a configurable replication factor, fans out search queries across all shards, and rebalances shard assignments when nodes are added or removed — all using the Meilisearch Community Edition.

The Problem

Meilisearch loads its entire index into memory-mapped LMDB files. A large index that exceeds a single server's available RAM cannot run on that server. The Enterprise Edition's native sharding is gated behind a commercial license. Miroir solves this without it.

How It Works

Client
  │
  ▼
Miroir Orchestrator
  ├── Write path: hash(doc_id) → assign to shard → write to R replicas
  ├── Read path:  scatter query to all shards → gather → merge ranked results
  └── Rebalance: on node add/remove → recompute assignments → migrate minimum shards

Meilisearch Nodes (N instances, each holding a subset of shards)
  node-0   node-1   node-2   ...   node-N

Replication Factor

Analogous to software RAID — configurable per deployment:

RF Redundancy Node failures tolerated Capacity
1 None (stripe only) 0 100% of fleet
2 One replica 1 per shard group 50% of fleet
3 Two replicas 2 per shard group 33% of fleet

Key Components

  • Orchestrator — proxy that handles shard routing, scatter-gather, result merging, and topology management
  • Shard router — consistent hash function (Rendezvous/HRW) mapping document IDs to node assignments; minimal reshuffling on topology change
  • Rebalancer — on node add/remove, recomputes assignments and migrates only the shards that changed owners; surviving replicas serve reads during rebuild
  • Result merger — normalizes and merges ranked result sets from multiple shards into a single coherent response

Status

Design phase. See docs/ for architecture detail.