miroir/tests/benches/score-comparability
jedarden b23e70656e P2.2: Implement write path with primary key validation, shard injection, and two-rule quorum
Implements POST/PUT /indexes/{uid}/documents and DELETE /indexes/{uid}/documents:

- Primary key extraction on hot path with 400 miroir_primary_key_required if missing
- _miroir_shard injection into every document before forwarding to nodes
- Rejection of _miroir_shard in client-submitted docs (400 miroir_reserved_field)
- Two-rule quorum: per-group floor(RF/2)+1 ACKs, success if ≥1 group meets quorum
- X-Miroir-Degraded header when any group misses quorum
- 503 miroir_no_quorum only when NO group meets quorum
- Per-batch grouping by target shard for efficient HTTP fan-out
- DELETE by IDs routes each ID independently to its shard
- DELETE by filter broadcasts to all nodes

Acceptance tests pass:
- Primary key validation before any writes
- Reserved field rejection
- Shard distribution uniformity (17-26 shards/node with 64 shards/3 nodes)
- Quorum calculation: floor(RF/2)+1
- Meilisearch-compatible error shape

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 06:48:30 -04:00
..
corpus P12.OP4: Score normalization at scale — research & benchmark infrastructure 2026-04-18 23:58:08 -04:00
queries P12.OP4: Score normalization at scale — research & benchmark infrastructure 2026-04-18 23:58:08 -04:00
results P2.2: Implement write path with primary key validation, shard injection, and two-rule quorum 2026-04-19 06:48:30 -04:00
README.md P12.OP4: Score normalization at scale — research & benchmark infrastructure 2026-04-18 23:58:08 -04:00
simulate.py Phase 1 Core Routing: validate and fix compilation 2026-04-19 03:22:33 -04:00

Score Comparability Benchmark

Tests whether _rankingScore values from different shards are comparable when documents are distributed unevenly across shards.

Problem Statement

Meilisearch's ranking pipeline computes scores using local statistics (term frequency, document frequency). When shards have very different document distributions, identical queries may return scores that aren't directly comparable, leading to incorrect merged rankings.

Experiment Design

  1. Ground truth: Single Meilisearch index with all documents
  2. Distributed setup: Same documents sharded across N nodes with intentional skew
  3. Measurement: Kendall tau (τ) between merged distributed results and ground truth
  4. Pass criterion: τ ≥ 0.95 on average across 10k random queries

Corpus Structure

  • 100,000 documents total
  • 10 shards (shard 0 = normal, shard 1 = 100× normal, shard 9 = 0.01× normal)
  • Documents have: id, title, content (synthetic text), category (for filtering)
  • 50 unique terms distributed across documents with varying frequencies

Directory Layout

  • corpus/: Test document sets (JSONL)
  • queries/: Generated query sets for experiments
  • results/: Experimental results and analysis

Running Experiments

See individual experiment scripts in results/ directories.