miroir

History

jedarden 9ce1b36206 P12.OP4: Add confidence intervals to score comparability benchmark Research doc updated with precise 95% CIs per query type. compare.py now computes and reports confidence intervals. Kendall τ = 0.79 (95% CI [0.7873, 0.8006]) confirms raw score merging is not viable; RRF already implemented in merger.rs as mitigation. Follow-up bead created (miroir-zfo) for RRF quality validation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 00:07:42 -04:00
..
benches/score-comparability	P12.OP4: Add confidence intervals to score comparability benchmark	2026-04-19 00:07:42 -04:00

jedarden 9ce1b36206 P12.OP4: Add confidence intervals to score comparability benchmark

Research doc updated with precise 95% CIs per query type. compare.py
now computes and reports confidence intervals. Kendall τ = 0.79
(95% CI [0.7873, 0.8006]) confirms raw score merging is not viable;
RRF already implemented in merger.rs as mitigation. Follow-up bead
created (miroir-zfo) for RRF quality validation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-04-19 00:07:42 -04:00

benches/score-comparability

P12.OP4: Add confidence intervals to score comparability benchmark

2026-04-19 00:07:42 -04:00