miroir/docs/research/score-comparability
jedarden 360378bde2 P11.8: Amend plan §12 to reflect Rust-idiomatic test layout
The plan §12 previously specified tests/ at root with integration/
and chaos/ subdirectories. However, the actual implementation uses
the idiomatic Rust convention with tests in crates/*/tests/.

This commit:
- Updates plan §12 repository structure to document the actual layout
- Moves tests/benches/score-comparability to docs/research/ (research artifacts)
- Removes the now-empty tests/ directory

CI already runs cargo test --all --all-features which correctly
discovers and runs all crate-level integration tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 06:49:04 -04:00
..
corpus P11.8: Amend plan §12 to reflect Rust-idiomatic test layout 2026-05-20 06:49:04 -04:00
queries P11.8: Amend plan §12 to reflect Rust-idiomatic test layout 2026-05-20 06:49:04 -04:00
results P11.8: Amend plan §12 to reflect Rust-idiomatic test layout 2026-05-20 06:49:04 -04:00
README.md P11.8: Amend plan §12 to reflect Rust-idiomatic test layout 2026-05-20 06:49:04 -04:00
simulate.py P11.8: Amend plan §12 to reflect Rust-idiomatic test layout 2026-05-20 06:49:04 -04:00

Score Comparability Benchmark

Tests whether _rankingScore values from different shards are comparable when documents are distributed unevenly across shards.

Problem Statement

Meilisearch's ranking pipeline computes scores using local statistics (term frequency, document frequency). When shards have very different document distributions, identical queries may return scores that aren't directly comparable, leading to incorrect merged rankings.

Experiment Design

  1. Ground truth: Single Meilisearch index with all documents
  2. Distributed setup: Same documents sharded across N nodes with intentional skew
  3. Measurement: Kendall tau (τ) between merged distributed results and ground truth
  4. Pass criterion: τ ≥ 0.95 on average across 10k random queries

Corpus Structure

  • 100,000 documents total
  • 10 shards (shard 0 = normal, shard 1 = 100× normal, shard 9 = 0.01× normal)
  • Documents have: id, title, content (synthetic text), category (for filtering)
  • 50 unique terms distributed across documents with varying frequencies

Directory Layout

  • corpus/: Test document sets (JSONL)
  • queries/: Generated query sets for experiments
  • results/: Experimental results and analysis

Running Experiments

See individual experiment scripts in results/ directories.