miroir/tests
jedarden affb59fff6 P12.OP4: Validate RRF merge quality — τ=0.14 confirms DFS preflight is required
RRF merge (k=60) benchmarked against ground truth with 10K queries on
skewed 10-shard corpus (93% on shard 1). Result: Kendall τ = 0.1369
(95% CI [0.1339, 0.1399]), far below the 0.95 threshold. 9,998 of 10,000
queries fell below τ=0.95, confirming RRF alone is insufficient for
cross-shard ranking quality with skewed distributions.

DFS preflight (already implemented) achieves τ = 0.9818, passing the
threshold. Add full 10K-query DFS comparison report and fix paths in
experiment.json.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 05:43:42 -04:00
..
benches/score-comparability P12.OP4: Validate RRF merge quality — τ=0.14 confirms DFS preflight is required 2026-04-19 05:43:42 -04:00