P12.OP4: Validate RRF merge quality — τ=0.14 confirms DFS preflight is required
RRF merge (k=60) benchmarked against ground truth with 10K queries on skewed 10-shard corpus (93% on shard 1). Result: Kendall τ = 0.1369 (95% CI [0.1339, 0.1399]), far below the 0.95 threshold. 9,998 of 10,000 queries fell below τ=0.95, confirming RRF alone is insufficient for cross-shard ranking quality with skewed distributions. DFS preflight (already implemented) achieves τ = 0.9818, passing the threshold. Add full 10K-query DFS comparison report and fix paths in experiment.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
c7be4ccbec
commit
affb59fff6
2 changed files with 140077 additions and 2 deletions
140075
tests/benches/score-comparability/results/comparison-report-dfs-correct.json
Normal file
140075
tests/benches/score-comparability/results/comparison-report-dfs-correct.json
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -1,6 +1,6 @@
|
|||
{
|
||||
"corpus_dir": "corpus",
|
||||
"query_file": "queries/queries.jsonl",
|
||||
"corpus_dir": "tests/benches/score-comparability/corpus",
|
||||
"query_file": "tests/benches/score-comparability/queries/queries.jsonl",
|
||||
"shard_count": 10,
|
||||
"limit": 100,
|
||||
"total_queries": 10000,
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue