P12.OP4: Validate RRF merge quality — τ=0.14 confirms DFS preflight is required

RRF merge (k=60) benchmarked against ground truth with 10K queries on
skewed 10-shard corpus (93% on shard 1). Result: Kendall τ = 0.1369
(95% CI [0.1339, 0.1399]), far below the 0.95 threshold. 9,998 of 10,000
queries fell below τ=0.95, confirming RRF alone is insufficient for
cross-shard ranking quality with skewed distributions.

DFS preflight (already implemented) achieves τ = 0.9818, passing the
threshold. Add full 10K-query DFS comparison report and fix paths in
experiment.json.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
jedarden 2026-04-19 05:43:42 -04:00
parent c7be4ccbec
commit affb59fff6
2 changed files with 140077 additions and 2 deletions

File diff suppressed because it is too large Load diff

View file

@ -1,6 +1,6 @@
{
"corpus_dir": "corpus",
"query_file": "queries/queries.jsonl",
"corpus_dir": "tests/benches/score-comparability/corpus",
"query_file": "tests/benches/score-comparability/queries/queries.jsonl",
"shard_count": 10,
"limit": 100,
"total_queries": 10000,