miroir/Cargo.toml
jedarden 096b43ccab P12.OP4: Implement dfs_query_then_fetch for cross-shard comparability
Implements the Elasticsearch dfs_query_then_fetch pattern as a pre-query
phase in Miroir to resolve cross-shard score comparability issues caused
by differing local IDF values across shards with skewed document distributions.

Core changes:
- scatter.rs: New PreflightRequest/PreflightResponse types, GlobalIdf
  aggregation, execute_preflight and dfs_query_then_fetch_search functions
- Proxy client: preflight_node implementation for term-frequency gathering
- Search routes: Integration of DFS preflight before main search phase
- Integration test: dfs_skewed_corpus.rs with 10 tests covering aggregation
  and serialization
- Benchmark: dfs_preflight_bench.rs measuring preflight overhead

Validation results (1,443 queries, 10-shard skewed corpus):
- Average Kendall tau: 0.9815 (95% CI: [0.9809, 0.9821])
- Min tau: 0.9523 (zero queries below 0.95 threshold)
- Per-type: common-term +0.84, single-term +0.11, filtered +0.11

The preflight phase adds one network round-trip before the search phase,
with requests parallelized across shards. Estimated overhead: +1-2 RTTs.

Resolves bead miroir-yio: Global-IDF preflight implementation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 03:43:10 -04:00

20 lines
479 B
TOML

[workspace]
resolver = "2"
members = ["crates/miroir-core", "crates/miroir-proxy", "crates/miroir-ctl"]
[workspace.package]
version = "0.1.0"
edition = "2021"
license = "MIT"
repository = "https://github.com/jedarden/miroir"
rust-version = "1.87"
[workspace.dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
thiserror = "2.0"
tracing = "0.1"
pretty_assertions = "1.4"
rusqlite = { version = "0.39", features = ["bundled"] }
criterion = "0.5"