jedarden/miroir

Author	SHA1	Message	Date
jedarden	1f686c646b	Merge remote-tracking branch 'origin/master' # Conflicts: # .beads/issues.jsonl # .beads/traces/bf-5xqk/metadata.json # .beads/traces/bf-5xqk/stdout.txt # .beads/traces/miroir-9dj/metadata.json # .beads/traces/miroir-9dj/stdout.txt # .beads/traces/miroir-cdo/metadata.json # .beads/traces/miroir-cdo/stdout.txt # .beads/traces/miroir-mkk/metadata.json # .beads/traces/miroir-mkk/stdout.txt # .beads/traces/miroir-r3j/metadata.json # .beads/traces/miroir-r3j/stdout.txt # .beads/traces/miroir-uhj/metadata.json # .beads/traces/miroir-uhj/stdout.txt # .beads/traces/miroir-zc2.6/metadata.json # .beads/traces/miroir-zc2.6/stdout.txt # .needle-predispatch-sha # Cargo.lock # charts/miroir/Chart.yaml # charts/miroir/templates/NOTES.txt # charts/miroir/templates/_helpers.tpl # charts/miroir/templates/redis-deployment.yaml # charts/miroir/templates/serviceaccount.yaml # charts/miroir/tests/README.md # charts/miroir/values.schema.json # charts/miroir/values.yaml # crates/miroir-core/Cargo.toml # crates/miroir-core/src/config.rs # crates/miroir-core/src/hedging.rs # crates/miroir-core/src/lib.rs # crates/miroir-core/src/merger.rs # crates/miroir-core/src/query_planner.rs # crates/miroir-core/src/raft_proto/mod.rs # crates/miroir-core/src/replica_selection.rs # crates/miroir-core/src/router.rs # crates/miroir-core/src/scatter.rs # crates/miroir-core/src/task_store/mod.rs # crates/miroir-core/src/task_store/redis.rs # crates/miroir-core/src/task_store/sqlite.rs # crates/miroir-core/src/topology.rs # crates/miroir-ctl/src/credentials.rs # crates/miroir-proxy/Cargo.toml # crates/miroir-proxy/src/auth.rs # crates/miroir-proxy/src/client.rs # crates/miroir-proxy/src/lib.rs # crates/miroir-proxy/src/main.rs # crates/miroir-proxy/src/middleware.rs # crates/miroir-proxy/src/routes/admin.rs # crates/miroir-proxy/src/routes/documents.rs # crates/miroir-proxy/src/routes/indexes.rs # crates/miroir-proxy/src/routes/search.rs # crates/miroir-proxy/src/routes/settings.rs # crates/miroir-proxy/src/routes/tasks.rs # docs/research/score-normalization-at-scale.md # notes/miroir-cdo.md # notes/miroir-r3j-final-verification.md # notes/miroir-r3j-verification.md # notes/miroir-r3j.1.md # notes/miroir-r3j.md # notes/miroir-zc2.1.md # notes/miroir-zc2.3.md # notes/miroir-zc2.4.md # notes/miroir-zc2.5.md	2026-05-24 05:21:32 -04:00
jedarden	360378bde2	P11.8: Amend plan §12 to reflect Rust-idiomatic test layout The plan §12 previously specified tests/ at root with integration/ and chaos/ subdirectories. However, the actual implementation uses the idiomatic Rust convention with tests in crates/*/tests/. This commit: - Updates plan §12 repository structure to document the actual layout - Moves tests/benches/score-comparability to docs/research/ (research artifacts) - Removes the now-empty tests/ directory CI already runs cargo test --all --all-features which correctly discovers and runs all crate-level integration tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 06:49:04 -04:00
jedarden	e0d6735ec0	Phase 0 (miroir-qon): Foundation — verification complete Phase 0 (Foundation) was already established in the repository. All required components are in place: - Cargo workspace with three crates (miroir-core, miroir-proxy, miroir-ctl) - rust-toolchain.toml pinning Rust 1.87 - All key dependencies wired (axum, tokio, reqwest, serde, config, clap, uuid) - Config struct with full YAML schema from plan §4 - Style configuration (rustfmt.toml, clippy.toml, .editorconfig) - Project files (CHANGELOG.md, LICENSE, .gitignore) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-08 19:20:18 -04:00
jedarden	c7be4ccbec	P12.OP4.1: Validate dfs_query_then_fetch benchmark (τ=0.9817) and document latency Re-ran the 10K-query score-comparability benchmark with fresh results: - DFS (global IDF preflight): avg τ = 0.9817, min τ = 0.9523, 0 queries below 0.95 → PASS - Score merge (local IDF): avg τ = 0.7938, 62.9% queries below 0.95 → FAIL - RRF merge: avg τ = 0.1361, 100% queries below 0.95 → CATASTROPHIC Added Criterion latency benchmarks to the research doc: - Global IDF aggregation: 285ns (3 shards) → 3.31µs (50 shards) - Query term extraction: 69ns (1 word) → 726ns (9 words) - IDF computation: ~113ps per term (trivial) - Coordinator-side overhead is sub-microsecond; dominant cost is network round-trip Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:31:13 -04:00
jedarden	096b43ccab	P12.OP4: Implement dfs_query_then_fetch for cross-shard comparability Implements the Elasticsearch dfs_query_then_fetch pattern as a pre-query phase in Miroir to resolve cross-shard score comparability issues caused by differing local IDF values across shards with skewed document distributions. Core changes: - scatter.rs: New PreflightRequest/PreflightResponse types, GlobalIdf aggregation, execute_preflight and dfs_query_then_fetch_search functions - Proxy client: preflight_node implementation for term-frequency gathering - Search routes: Integration of DFS preflight before main search phase - Integration test: dfs_skewed_corpus.rs with 10 tests covering aggregation and serialization - Benchmark: dfs_preflight_bench.rs measuring preflight overhead Validation results (1,443 queries, 10-shard skewed corpus): - Average Kendall tau: 0.9815 (95% CI: [0.9809, 0.9821]) - Min tau: 0.9523 (zero queries below 0.95 threshold) - Per-type: common-term +0.84, single-term +0.11, filtered +0.11 The preflight phase adds one network round-trip before the search phase, with requests parallelized across shards. Estimated overhead: +1-2 RTTs. Resolves bead miroir-yio: Global-IDF preflight implementation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 03:43:10 -04:00
jedarden	b201f0ff58	P12.OP4: Finalize score normalization validation — RRF τ=0.14, score τ=0.79 Research complete: both score-based and RRF merge fail 0.95 threshold. Updated research doc with full RRF validation results and confidence intervals. Added benchmark result reports and helper tests. Follow-up bead miroir-n6v created for global-IDF preflight (dfs_query_then_fetch pattern). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 02:40:54 -04:00
jedarden	9ce1b36206	P12.OP4: Add confidence intervals to score comparability benchmark Research doc updated with precise 95% CIs per query type. compare.py now computes and reports confidence intervals. Kendall τ = 0.79 (95% CI [0.7873, 0.8006]) confirms raw score merging is not viable; RRF already implemented in merger.rs as mitigation. Follow-up bead created (miroir-zfo) for RRF quality validation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 00:07:42 -04:00
jedarden	72f9a197b5	P12.OP4: Score normalization at scale — research & benchmark infrastructure Completed Plan §15 Open Problem #4 research on cross-shard score comparability. ## Key Finding Average Kendall tau: 0.79 vs. 0.95 threshold — FAIL Cross-shard score comparability is a significant issue: - Common-term queries: τ = 0.15 (catastrophic) - Local IDF statistics cause score inflation on small shards - Documents from 10-doc shards outrank 93K-doc shard results ## Recommendation Implement Reciprocal Rank Fusion (RRF) for result merging. Follow-up bead: miroir-nsu ## Artifacts Added - Benchmark infrastructure: tests/benches/score-comparability/ - Corpus generator with extreme shard skew (100× variance) - Query generator (10K random queries across 5 types) - BM25-based simulation with global vs local IDF - Kendall tau comparison tool - Full experimental results (τ = 0.79 ± 0.01, 95% CI) - Research writeup: docs/research/score-normalization-at-scale.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 23:58:08 -04:00
jedarden	ffc0ae3beb	P12.OP2: Finalize Raft research — correct openraft version, update benchmarks, suppress warnings Correct openraft version from 0.9.22 to 0.9.20 (latest stable per GitHub releases). Update benchmark measurements from fresh re-run (50K ops). Suppress dead_code warnings in benchmark module (functions only called from #[test]). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:37:20 -04:00
jedarden	7a6dea77cf	P12.OP2: Re-verify Raft state machine benchmark with fresh run Benchmark numbers stable: state machine apply ~1.0x direct HashMap overhead, both sub-microsecond. Confirms prior measurements. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:25:34 -04:00
jedarden	2c628a6f87	P12.OP2: Re-run Raft state machine benchmark, update measured values Fresh benchmark confirms state machine apply adds ~1.0-1.1x overhead vs direct HashMap — both sub-microsecond. Real Raft cost remains network + fsync (2-5ms vs Redis 0.3-0.8ms). Decision unchanged: revisit before v2.0, do not ship in v0.x or v1.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:14:11 -04:00
jedarden	111a128278	P12.OP2: Update Raft vs Redis research with web survey findings Add rrqlite/openraft+SQLite reference project, correct raft-rs status to maintenance mode, note openraft 0.10 edition 2024 requirement, and add additional production users (Helyim, RobustMQ, rrqlite). Decision unchanged: do not ship Raft in v0.x or v1.0, revisit before v2.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 22:03:29 -04:00
jedarden	81155beb0d	P12.OP1: Shard migration write safety — cutover race window analysis Adds 14 chaos tests validating zero-data-loss at the migration cutover boundary under all AE/delta-pass configurations. Two new 3-node cluster variants exercise multi-owner shard migration with cross-node drain tracking. Key results: 0/1M loss with AE+delta; 0/50K loss with delta alone; ~2% hypothetical loss with neither (hard-refused by policy). The MigrationCoordinator blocks migration when both anti-entropy and delta pass are disabled. Also includes: anti-entropy cross-module validation gate, warning log when AE disabled during migration, empirical results table in docs/trade-offs.md, and plan §15 OP#1 status update to verified. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 21:52:34 -04:00
jedarden	232092ffbb	P0.5: Implement Config struct mirroring plan §4/§13 YAML schema Full serde-derived struct tree covering every block in plan §4 (MiroirConfig, NodeConfig, TaskStoreConfig, AdminConfig, HealthConfig, ScatterConfig, RebalancerConfig, ServerConfig, ConnectionPoolConfig, TaskRegistryConfig) and all 21 §13 advanced-capability sub-structs (ReshardingConfig through SearchUiConfig with nested auth/rate-limit/CSP/analytics structs), plus §14 horizontal-scaling structs (PeerDiscoveryConfig, LeaderElectionConfig, HpaConfig). Includes: - Layered loading via config crate: built-in defaults → file → env overrides - Config::validate() with 14 cross-field rules (HA requires redis, scoped_key timing inversion, node group bounds, tenant affinity range checks, etc.) - 10 unit tests: round-trip YAML, full plan example, minimal YAML defaults, and validation rejection cases Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 21:46:12 -04:00
jedarden	fe274a5c0e	P12.OP2: Add Raft vs Redis task store HA research doc Survey openraft, raft-rs, and async-raft crates. Design a Raft-backed TaskStore prototype using openraft with SQLite state machine. Analytical benchmark against Redis across latency, throughput, memory, and ops complexity. Decision: revisit before v2.0, do not ship in v0.x/v1.0 — Raft fails the decision gate (worse on write latency and correctness maturity despite removing the Redis dependency). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 21:00:53 -04:00
jedarden	409f952f59	Add repo hygiene: LICENSE, CHANGELOG, .gitignore - LICENSE: MIT (per plan §12) - CHANGELOG.md: Keep a Changelog 1.1.0 skeleton with [Unreleased] and [0.1.0] sections matching the awk extractor from plan §7 - .gitignore: Rust target/, editor junk; Cargo.lock kept in VCS Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 20:47:36 -04:00

16 commits