miroir/docs
jedarden 72f9a197b5 P12.OP4: Score normalization at scale — research & benchmark infrastructure
Completed Plan §15 Open Problem #4 research on cross-shard score comparability.

## Key Finding
Average Kendall tau: 0.79 vs. 0.95 threshold — FAIL

Cross-shard score comparability is a significant issue:
- Common-term queries: τ = 0.15 (catastrophic)
- Local IDF statistics cause score inflation on small shards
- Documents from 10-doc shards outrank 93K-doc shard results

## Recommendation
Implement Reciprocal Rank Fusion (RRF) for result merging.
Follow-up bead: miroir-nsu

## Artifacts Added
- Benchmark infrastructure: tests/benches/score-comparability/
  - Corpus generator with extreme shard skew (100× variance)
  - Query generator (10K random queries across 5 types)
  - BM25-based simulation with global vs local IDF
  - Kendall tau comparison tool
  - Full experimental results (τ = 0.79 ± 0.01, 95% CI)
- Research writeup: docs/research/score-normalization-at-scale.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 23:58:08 -04:00
..
benchmarks P12.OP3: Validate 2× transient load caveat and add CLI schedule window guard 2026-04-18 22:00:57 -04:00
dump-import P12.OP5: Add dump import compatibility matrix 2026-04-18 21:06:46 -04:00
notes Add repo hygiene: LICENSE, CHANGELOG, .gitignore 2026-04-18 20:47:36 -04:00
plan P0.7: Update plan with chaos-test results, sync beads 2026-04-18 23:03:21 -04:00
research P12.OP4: Score normalization at scale — research & benchmark infrastructure 2026-04-18 23:58:08 -04:00
trade-offs.md P12.OP1: Chaos-test cutover race window + hard refusal policy 2026-04-18 22:00:21 -04:00