Implements the Elasticsearch dfs_query_then_fetch pattern as a pre-query phase in Miroir to resolve cross-shard score comparability issues caused by differing local IDF values across shards with skewed document distributions. Core changes: - scatter.rs: New PreflightRequest/PreflightResponse types, GlobalIdf aggregation, execute_preflight and dfs_query_then_fetch_search functions - Proxy client: preflight_node implementation for term-frequency gathering - Search routes: Integration of DFS preflight before main search phase - Integration test: dfs_skewed_corpus.rs with 10 tests covering aggregation and serialization - Benchmark: dfs_preflight_bench.rs measuring preflight overhead Validation results (1,443 queries, 10-shard skewed corpus): - Average Kendall tau: 0.9815 (95% CI: [0.9809, 0.9821]) - Min tau: 0.9523 (zero queries below 0.95 threshold) - Per-type: common-term +0.84, single-term +0.11, filtered +0.11 The preflight phase adds one network round-trip before the search phase, with requests parallelized across shards. Estimated overhead: +1-2 RTTs. Resolves bead miroir-yio: Global-IDF preflight implementation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
20 lines
479 B
TOML
20 lines
479 B
TOML
[workspace]
|
|
resolver = "2"
|
|
members = ["crates/miroir-core", "crates/miroir-proxy", "crates/miroir-ctl"]
|
|
|
|
[workspace.package]
|
|
version = "0.1.0"
|
|
edition = "2021"
|
|
license = "MIT"
|
|
repository = "https://github.com/jedarden/miroir"
|
|
rust-version = "1.87"
|
|
|
|
[workspace.dependencies]
|
|
serde = { version = "1.0", features = ["derive"] }
|
|
serde_json = "1.0"
|
|
thiserror = "2.0"
|
|
tracing = "0.1"
|
|
pretty_assertions = "1.4"
|
|
rusqlite = { version = "0.39", features = ["bundled"] }
|
|
criterion = "0.5"
|
|
|