P12.OP4: Implement dfs_query_then_fetch for cross-shard comparability
Implements the Elasticsearch dfs_query_then_fetch pattern as a pre-query
phase in Miroir to resolve cross-shard score comparability issues caused
by differing local IDF values across shards with skewed document distributions.
Core changes:
- scatter.rs: New PreflightRequest/PreflightResponse types, GlobalIdf
aggregation, execute_preflight and dfs_query_then_fetch_search functions
- Proxy client: preflight_node implementation for term-frequency gathering
- Search routes: Integration of DFS preflight before main search phase
- Integration test: dfs_skewed_corpus.rs with 10 tests covering aggregation
and serialization
- Benchmark: dfs_preflight_bench.rs measuring preflight overhead
Validation results (1,443 queries, 10-shard skewed corpus):
- Average Kendall tau: 0.9815 (95% CI: [0.9809, 0.9821])
- Min tau: 0.9523 (zero queries below 0.95 threshold)
- Per-type: common-term +0.84, single-term +0.11, filtered +0.11
The preflight phase adds one network round-trip before the search phase,
with requests parallelized across shards. Estimated overhead: +1-2 RTTs.
Resolves bead miroir-yio: Global-IDF preflight implementation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>