jedarden/miroir

Author	SHA1	Message	Date
jedarden	bf07642ba3	feat(bench): add integration benchmarks and fix compilation - Fix missing `warn!` macro import in cdc.rs (was causing compilation error) - Add integration benchmarks for end-to-end performance (tests/integration_bench.rs): - bench_e2e_search_latency: Compares Miroir vs standalone search latency Target: Miroir < 2× standalone (plan §8) - bench_ingest_throughput: Compares Miroir vs standalone ingest throughput Target: Miroir > 80% of standalone (plan §8) - Additional benchmarks: concurrent_search, faceted_search, pagination These benchmarks require a running docker-compose stack: cd examples && docker-compose -f docker-compose-dev.yml up -d Closes: miroir-89x.5	2026-05-24 10:53:48 -04:00
jedarden	304879d32a	feat(tests): add chaos test scenarios and runbooks (plan §8, P9.4) Add comprehensive chaos testing infrastructure for Miroir failure scenarios: - TestCluster harness with chaos helpers: - `kill_meili()` / `restart_meili()` for node failure simulation - `apply_netem()` / `remove_netem()` for network delay injection - `kill_miroir()` / `restart_miroir()` for orchestrator failure - Docker-compose stack lifecycle management - 6 chaos test scenarios (all marked `#[ignore]`): 1. Kill 1 of 3 nodes (RF=2) → continuous search, no degraded header 2. Kill 2 of 3 nodes (RF=2) → 503 or partial results with degraded header 3. Kill 1 of 2 Miroir replicas → zero client-visible downtime 4. tc netem 500ms delay → searches slow but succeed, no errors 5. Restart killed node → Miroir detects recovery within health check interval 6. Kill node mid-rebalance → rebalancer pauses, resumes on recovery - Runbooks in `tests/chaos/runbooks/scenario.md`: - Manual reproduction steps - Expected observables (metrics, headers, errors) - Recovery procedures - HA vs single-instance differences - Operator notes and common causes - Updated docker-compose files*: - Added `CAP_NET_ADMIN` to all Meilisearch containers for tc netem support Tests are slow (30+ seconds each) and require docker-compose. Run with: cargo test --test chaos -- --ignored --test-threads=1 Closes: miroir-89x.4	2026-05-24 10:23:24 -04:00
jedarden	158752fe7b	feat(multi-search): implement timeout enforcement and acceptance tests (§13.11) - Add per-query and total timeout enforcement to MultiSearchExecutor - Improve SearchResult with helper methods (ok, err, timeout, is_success) - Fix ModeACoordinator feature-gate compilation issues - Add comprehensive acceptance tests for multi-search: - 5-query batch completion - Slow query doesn't block fast queries (parallel execution) - Partial failure handling - Per-query timeout - Total timeout - 100-query batch completion Closes: miroir-uhj.11	2026-05-24 01:54:20 -04:00
jedarden	7832d1b578	test(integration): Add integration tests per plan §8 Add comprehensive integration tests for Miroir with 3 Meilisearch nodes via docker-compose. Tests cover: - Document round-trip with distribution verification (1000 docs) - Search covers all shards (100 docs with unique keywords) - Facet aggregation across shards (100 docs, 3 colors) - Offset/limit paging consistency (50 docs, 5×paged vs single) - Settings broadcast to all nodes (synonyms test) - Task polling for large batches (500 docs) - Node failure with RF=2 (requires docker-compose-dev-rf2) Also added integration test README with setup and running instructions. Per plan §8: Integration tests validate end-to-end behavior including document distribution, shard coverage, facet aggregation, paging, settings broadcast, task polling, and node failure with RF=2. Closes: miroir-89x (Phase 9 — Testing) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-24 01:29:45 -04:00
jedarden	7bbf8f1061	P9.2: Integration test harness with docker-compose Add comprehensive integration test infrastructure: - docker-compose-dev.yml: 3 Meilisearch nodes + Miroir (RG=1, RF=1, S=16) - docker-compose-dev-rf2.yml: 6 Meilisearch nodes + Redis + Miroir (RG=2, RF=2) - dev-config.yaml/dev-config-rf2.yaml: Configurations for both stacks - Integration tests in crates/miroir-proxy/tests/docker_compose_integration.rs - Documentation in crates/miroir-proxy/tests/README_integration.md - CI workflow in k8s/argo-workflows/miroir-ci-docker-compose-smoke.yaml Test coverage (plan §8): - Document round-trip (1000 docs) - Search coverage across all 16 shards - Facet aggregation - Offset/limit pagination - Settings broadcast - Task polling - Health checks - Node failure with RF=2 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-23 07:33:34 -04:00
jedarden	32bda26613	P9.2: Integration test harness with docker-compose Implement integration test suite for Miroir with docker-compose: - Updated docker-compose-dev.yml to use Meilisearch v1.37.0 - Created tests/integration.rs with comprehensive test coverage: * Document round-trip (1000 docs) * Search coverage across all shards (unique-keyword test) * Facet aggregation (3 colors, sum = 100) * Offset/limit paging * Settings broadcast * Task polling * Health check * Node failure test with RF=2 - Created docker-compose-dev-rf2.yml for RF=2/HA testing (6 nodes) - Created dev-config-rf2.yaml for RF=2 configuration - Created tests/README.md with documentation Tests run against real Docker Compose stack: cargo test --test integration -- --test-threads=1 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-23 07:26:55 -04:00
jedarden	cf9ae11c3a	P6.2: Fix verification script shebang for NixOS compatibility The script had #!/bin/bash which doesn't exist on NixOS systems. Changed to #!/usr/bin/env bash for portability. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-23 02:47:02 -04:00
jedarden	26c9521ba9	P6.2: Fix peer discovery DNS SRV service name and add POD_IP Fixes the peer discovery service name mismatch that caused SRV lookups to fail. The headless Service is named "<fullname>-headless" but the config was using ".Release.Name-headless", which didn't match. Also adds POD_IP to the Downward API env vars (was missing). Changes: - _helpers.tpl: Use miroir.fullname instead of Release.Name for service_name default - values.yaml: Document service_name default as auto-derived - miroir-deployment.yaml: Add POD_IP env var via Downward API - verify_p6_2_peer_discovery.sh: Add POD_IP verification step Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-23 02:39:28 -04:00
jedarden	e6cdd05f30	P6.2: Fix peer discovery DNS SRV service name and add test - Fix SRV lookup to use `_http._tcp` instead of `_miroir._tcp` (matches headless Service port name) - Add filter to skip empty strings when extracting pod names from SRV targets - Add test coverage for SRV target pod name extraction - Add verification script for P6.2 peer discovery metrics The peer discovery implementation was already complete with: - Headless Service template (miroir-headless.yaml) - Downward API env vars (POD_NAME, POD_NAMESPACE, POD_IP) in Deployment - Background refresh loop in main.rs - miroir_peer_pod_count metric in middleware.rs This commit fixes the SRV service name and adds robustness. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-23 02:29:28 -04:00
jedarden	5f9ee20eeb	P7.1: Core metrics families acceptance tests Add accessor methods for request metrics (duration, total) to enable testing of histogram/counter metrics that require samples to appear in Prometheus output. Fix p7_1_core_metrics.rs test to: - Use new accessor methods to record request metric samples - Check for HELP/TYPE metadata in addition to data lines - Relax histogram bucket format check to verify non-zero count All 18 core plan §10 metrics are verified: - Requests: duration, total, in_flight - Node health: healthy, request_duration, errors_total - Shards: coverage, degraded_shards_total, distribution - Tasks: processing_age, total, registry_size - Scatter-gather: fan_out_size, partial_responses_total, retries_total - Rebalancer: in_progress, documents_migrated_total, duration_seconds Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-23 02:29:28 -04:00
jedarden	064a33ce1c	miroir-zc2.5: Fix dump import compatibility matrix enhancement bead refs The matrix incorrectly referenced miroir-zc2.6/7/8 as dump import enhancement beads, but zc2.6 is actually arm64 support and zc2.7/8 don't exist. Replaced with a descriptive "Future Enhancements" table that maintains traceability without false bead dependencies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Bead-Id: miroir-zc2.5 Bead-Id: miroir-r3j.6 Bead-Id: bf-1p4v	2026-05-20 07:18:56 -04:00
jedarden	360378bde2	P11.8: Amend plan §12 to reflect Rust-idiomatic test layout The plan §12 previously specified tests/ at root with integration/ and chaos/ subdirectories. However, the actual implementation uses the idiomatic Rust convention with tests in crates/*/tests/. This commit: - Updates plan §12 repository structure to document the actual layout - Moves tests/benches/score-comparability to docs/research/ (research artifacts) - Removes the now-empty tests/ directory CI already runs cargo test --all --all-features which correctly discovers and runs all crate-level integration tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 06:49:04 -04:00
jedarden	b23e70656e	P2.2: Implement write path with primary key validation, shard injection, and two-rule quorum Implements POST/PUT /indexes/{uid}/documents and DELETE /indexes/{uid}/documents: - Primary key extraction on hot path with 400 miroir_primary_key_required if missing - _miroir_shard injection into every document before forwarding to nodes - Rejection of _miroir_shard in client-submitted docs (400 miroir_reserved_field) - Two-rule quorum: per-group floor(RF/2)+1 ACKs, success if ≥1 group meets quorum - X-Miroir-Degraded header when any group misses quorum - 503 miroir_no_quorum only when NO group meets quorum - Per-batch grouping by target shard for efficient HTTP fan-out - DELETE by IDs routes each ID independently to its shard - DELETE by filter broadcasts to all nodes Acceptance tests pass: - Primary key validation before any writes - Reserved field rejection - Shard distribution uniformity (17-26 shards/node with 64 shards/3 nodes) - Quorum calculation: floor(RF/2)+1 - Meilisearch-compatible error shape Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 06:48:30 -04:00
jedarden	affb59fff6	P12.OP4: Validate RRF merge quality — τ=0.14 confirms DFS preflight is required RRF merge (k=60) benchmarked against ground truth with 10K queries on skewed 10-shard corpus (93% on shard 1). Result: Kendall τ = 0.1369 (95% CI [0.1339, 0.1399]), far below the 0.95 threshold. 9,998 of 10,000 queries fell below τ=0.95, confirming RRF alone is insufficient for cross-shard ranking quality with skewed distributions. DFS preflight (already implemented) achieves τ = 0.9818, passing the threshold. Add full 10K-query DFS comparison report and fix paths in experiment.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:43:42 -04:00
jedarden	c7be4ccbec	P12.OP4.1: Validate dfs_query_then_fetch benchmark (τ=0.9817) and document latency Re-ran the 10K-query score-comparability benchmark with fresh results: - DFS (global IDF preflight): avg τ = 0.9817, min τ = 0.9523, 0 queries below 0.95 → PASS - Score merge (local IDF): avg τ = 0.7938, 62.9% queries below 0.95 → FAIL - RRF merge: avg τ = 0.1361, 100% queries below 0.95 → CATASTROPHIC Added Criterion latency benchmarks to the research doc: - Global IDF aggregation: 285ns (3 shards) → 3.31µs (50 shards) - Query term extraction: 69ns (1 word) → 726ns (9 words) - IDF computation: ~113ps per term (trivial) - Coordinator-side overhead is sub-microsecond; dominant cost is network round-trip Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:31:13 -04:00
jedarden	068cb5a77f	Phase 1 Core Routing: verify DoD complete, update tracking files All Phase 1 DoD criteria verified: - Rendezvous assignment deterministic (router.rs 100% coverage) - Reshuffle bound on add ≤ 2×(1/4) (proptest + unit test) - 64 shards/3 nodes/RF=1 → 17-26 per node (uniformity test) - write_targets returns RG×RF nodes (acceptance tests) - covering_set with replica rotation (acceptance tests) - merger passes all merge/facet/limit tests - miroir-core ≥ 90% line coverage (90.17% via tarpaulin) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 04:06:34 -04:00
jedarden	096b43ccab	P12.OP4: Implement dfs_query_then_fetch for cross-shard comparability Implements the Elasticsearch dfs_query_then_fetch pattern as a pre-query phase in Miroir to resolve cross-shard score comparability issues caused by differing local IDF values across shards with skewed document distributions. Core changes: - scatter.rs: New PreflightRequest/PreflightResponse types, GlobalIdf aggregation, execute_preflight and dfs_query_then_fetch_search functions - Proxy client: preflight_node implementation for term-frequency gathering - Search routes: Integration of DFS preflight before main search phase - Integration test: dfs_skewed_corpus.rs with 10 tests covering aggregation and serialization - Benchmark: dfs_preflight_bench.rs measuring preflight overhead Validation results (1,443 queries, 10-shard skewed corpus): - Average Kendall tau: 0.9815 (95% CI: [0.9809, 0.9821]) - Min tau: 0.9523 (zero queries below 0.95 threshold) - Per-type: common-term +0.84, single-term +0.11, filtered +0.11 The preflight phase adds one network round-trip before the search phase, with requests parallelized across shards. Estimated overhead: +1-2 RTTs. Resolves bead miroir-yio: Global-IDF preflight implementation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 03:43:10 -04:00
jedarden	b2490ea64d	Phase 1 Core Routing: validate and fix compilation All Phase 1 DoD criteria verified: - Rendezvous assignment deterministic (test_determinism) - Reshuffle bound on add: ≤2×(1/4) edges (test_reshuffle_bound_on_add) - Uniformity: 64/3/RF=1 → 17-26 shards/node (test_uniformity) - RF placement stability on add/remove (test_rf2_placement_stability) - write_targets returns exactly RG×RF nodes, one per group - query_group distributes evenly (chi-square test) - covering_set with intra-group replica rotation - Merger passes merge/facet/limit/stripping tests - miroir-core ≥90% line coverage (92.07% via cargo-tarpaulin --lib) Fixes: - scatter.rs: NodeId::new(&str) → NodeId::new("...".into()) for type mismatch - merger.rs: add P12.OP4 RRF skew validation tests - config.rs: fix test to use redis backend for file loading - proxy: wire up client module, add indexes route stubs Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 03:22:33 -04:00
jedarden	b201f0ff58	P12.OP4: Finalize score normalization validation — RRF τ=0.14, score τ=0.79 Research complete: both score-based and RRF merge fail 0.95 threshold. Updated research doc with full RRF validation results and confidence intervals. Added benchmark result reports and helper tests. Follow-up bead miroir-n6v created for global-IDF preflight (dfs_query_then_fetch pattern). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 02:40:54 -04:00
jedarden	0de5f01d32	P2.2: Pluggable MergeStrategy trait + RRF scoring + full benchmark re-run - Extract MergeStrategy trait with merge()/name() methods - Implement RrfStrategy with configurable k (default 60) - Refactor scatter_gather_search to accept &dyn MergeStrategy - Add RRF simulation to benchmark script (simulate_distributed_search_rrf) - Re-run full benchmark (3989 queries) with updated comparison reports - Add topology unit tests (NodeId, NodeStatus, Node helpers) Benchmark results: Score-based merge: avg tau = 0.798 (FAIL, common-term tau = 0.152) RRF merge: avg tau = 0.134 (FAIL, rank-only loses score signal) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 02:07:39 -04:00
jedarden	baf124b7cf	P2.1: Add scatter-gather RRF integration + benchmark simulation Wire scatter (fan-out) directly into the RRF merger via scatter_gather_search(), completing the full read path: plan → scatter → RRF merge. Add RRF simulation mode to score-comparability benchmark for measuring rank correlation against global BM25 ground truth. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 01:38:10 -04:00
jedarden	612e7ce0ea	P1.5: Implement scatter module with covering-set construction + dispatch trait - Add NodeClient trait for HTTP calls to Meilisearch nodes (seam between pure miroir-core and networked miroir-proxy) - Add ScatterPlan struct containing chosen_group, target_shards, shard_to_node mapping, deadline_ms, hedging_eligible - Implement plan_search_scatter() pure function that constructs the covering set without I/O - Implement execute_scatter() async function that fans out to nodes with partial-failure handling - Add MockNodeClient for testing with pre-programmed responses/errors - Add unit tests for plan construction, query group rotation, shard-to-node mapping, hedging eligibility, and scatter execution Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 00:20:29 -04:00
jedarden	9ce1b36206	P12.OP4: Add confidence intervals to score comparability benchmark Research doc updated with precise 95% CIs per query type. compare.py now computes and reports confidence intervals. Kendall τ = 0.79 (95% CI [0.7873, 0.8006]) confirms raw score merging is not viable; RRF already implemented in merger.rs as mitigation. Follow-up bead created (miroir-zfo) for RRF quality validation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 00:07:42 -04:00
jedarden	72f9a197b5	P12.OP4: Score normalization at scale — research & benchmark infrastructure Completed Plan §15 Open Problem #4 research on cross-shard score comparability. ## Key Finding Average Kendall tau: 0.79 vs. 0.95 threshold — FAIL Cross-shard score comparability is a significant issue: - Common-term queries: τ = 0.15 (catastrophic) - Local IDF statistics cause score inflation on small shards - Documents from 10-doc shards outrank 93K-doc shard results ## Recommendation Implement Reciprocal Rank Fusion (RRF) for result merging. Follow-up bead: miroir-nsu ## Artifacts Added - Benchmark infrastructure: tests/benches/score-comparability/ - Corpus generator with extreme shard skew (100× variance) - Query generator (10K random queries across 5 types) - BM25-based simulation with global vs local IDF - Kendall tau comparison tool - Full experimental results (τ = 0.79 ± 0.01, 95% CI) - Research writeup: docs/research/score-normalization-at-scale.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 23:58:08 -04:00

24 commits