miroir

History

jedarden 72f9a197b5 P12.OP4: Score normalization at scale — research & benchmark infrastructure Completed Plan §15 Open Problem #4 research on cross-shard score comparability. ## Key Finding Average Kendall tau: 0.79 vs. 0.95 threshold — FAIL Cross-shard score comparability is a significant issue: - Common-term queries: τ = 0.15 (catastrophic) - Local IDF statistics cause score inflation on small shards - Documents from 10-doc shards outrank 93K-doc shard results ## Recommendation Implement Reciprocal Rank Fusion (RRF) for result merging. Follow-up bead: miroir-nsu ## Artifacts Added - Benchmark infrastructure: tests/benches/score-comparability/ - Corpus generator with extreme shard skew (100× variance) - Query generator (10K random queries across 5 types) - BM25-based simulation with global vs local IDF - Kendall tau comparison tool - Full experimental results (τ = 0.79 ± 0.01, 95% CI) - Research writeup: docs/research/score-normalization-at-scale.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-04-18 23:58:08 -04:00
..
benchmarks	P12.OP3: Validate 2× transient load caveat and add CLI schedule window guard	2026-04-18 22:00:57 -04:00
dump-import	P12.OP5: Add dump import compatibility matrix	2026-04-18 21:06:46 -04:00
notes	Add repo hygiene: LICENSE, CHANGELOG, .gitignore	2026-04-18 20:47:36 -04:00
plan	P0.7: Update plan with chaos-test results, sync beads	2026-04-18 23:03:21 -04:00
research	P12.OP4: Score normalization at scale — research & benchmark infrastructure	2026-04-18 23:58:08 -04:00
trade-offs.md	P12.OP1: Chaos-test cutover race window + hard refusal policy	2026-04-18 22:00:21 -04:00