Add comprehensive integration tests for Miroir with 3 Meilisearch nodes
via docker-compose. Tests cover:
- Document round-trip with distribution verification (1000 docs)
- Search covers all shards (100 docs with unique keywords)
- Facet aggregation across shards (100 docs, 3 colors)
- Offset/limit paging consistency (50 docs, 5×paged vs single)
- Settings broadcast to all nodes (synonyms test)
- Task polling for large batches (500 docs)
- Node failure with RF=2 (requires docker-compose-dev-rf2)
Also added integration test README with setup and running instructions.
Per plan §8: Integration tests validate end-to-end behavior including
document distribution, shard coverage, facet aggregation, paging, settings
broadcast, task polling, and node failure with RF=2.
Closes: miroir-89x (Phase 9 — Testing)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implement integration test suite for Miroir with docker-compose:
- Updated docker-compose-dev.yml to use Meilisearch v1.37.0
- Created tests/integration.rs with comprehensive test coverage:
* Document round-trip (1000 docs)
* Search coverage across all shards (unique-keyword test)
* Facet aggregation (3 colors, sum = 100)
* Offset/limit paging
* Settings broadcast
* Task polling
* Health check
* Node failure test with RF=2
- Created docker-compose-dev-rf2.yml for RF=2/HA testing (6 nodes)
- Created dev-config-rf2.yaml for RF=2 configuration
- Created tests/README.md with documentation
Tests run against real Docker Compose stack:
cargo test --test integration -- --test-threads=1
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The script had #!/bin/bash which doesn't exist on NixOS systems.
Changed to #!/usr/bin/env bash for portability.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes the peer discovery service name mismatch that caused SRV lookups
to fail. The headless Service is named "<fullname>-headless" but the
config was using ".Release.Name-headless", which didn't match.
Also adds POD_IP to the Downward API env vars (was missing).
Changes:
- _helpers.tpl: Use miroir.fullname instead of Release.Name for service_name default
- values.yaml: Document service_name default as auto-derived
- miroir-deployment.yaml: Add POD_IP env var via Downward API
- verify_p6_2_peer_discovery.sh: Add POD_IP verification step
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fix SRV lookup to use `_http._tcp` instead of `_miroir._tcp` (matches headless Service port name)
- Add filter to skip empty strings when extracting pod names from SRV targets
- Add test coverage for SRV target pod name extraction
- Add verification script for P6.2 peer discovery metrics
The peer discovery implementation was already complete with:
- Headless Service template (miroir-headless.yaml)
- Downward API env vars (POD_NAME, POD_NAMESPACE, POD_IP) in Deployment
- Background refresh loop in main.rs
- miroir_peer_pod_count metric in middleware.rs
This commit fixes the SRV service name and adds robustness.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add accessor methods for request metrics (duration, total) to enable
testing of histogram/counter metrics that require samples to appear
in Prometheus output.
Fix p7_1_core_metrics.rs test to:
- Use new accessor methods to record request metric samples
- Check for HELP/TYPE metadata in addition to data lines
- Relax histogram bucket format check to verify non-zero count
All 18 core plan §10 metrics are verified:
- Requests: duration, total, in_flight
- Node health: healthy, request_duration, errors_total
- Shards: coverage, degraded_shards_total, distribution
- Tasks: processing_age, total, registry_size
- Scatter-gather: fan_out_size, partial_responses_total, retries_total
- Rebalancer: in_progress, documents_migrated_total, duration_seconds
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The matrix incorrectly referenced miroir-zc2.6/7/8 as dump import
enhancement beads, but zc2.6 is actually arm64 support and zc2.7/8
don't exist. Replaced with a descriptive "Future Enhancements" table
that maintains traceability without false bead dependencies.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bead-Id: miroir-zc2.5
Bead-Id: miroir-r3j.6
Bead-Id: bf-1p4v
The plan §12 previously specified tests/ at root with integration/
and chaos/ subdirectories. However, the actual implementation uses
the idiomatic Rust convention with tests in crates/*/tests/.
This commit:
- Updates plan §12 repository structure to document the actual layout
- Moves tests/benches/score-comparability to docs/research/ (research artifacts)
- Removes the now-empty tests/ directory
CI already runs cargo test --all --all-features which correctly
discovers and runs all crate-level integration tests.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements POST/PUT /indexes/{uid}/documents and DELETE /indexes/{uid}/documents:
- Primary key extraction on hot path with 400 miroir_primary_key_required if missing
- _miroir_shard injection into every document before forwarding to nodes
- Rejection of _miroir_shard in client-submitted docs (400 miroir_reserved_field)
- Two-rule quorum: per-group floor(RF/2)+1 ACKs, success if ≥1 group meets quorum
- X-Miroir-Degraded header when any group misses quorum
- 503 miroir_no_quorum only when NO group meets quorum
- Per-batch grouping by target shard for efficient HTTP fan-out
- DELETE by IDs routes each ID independently to its shard
- DELETE by filter broadcasts to all nodes
Acceptance tests pass:
- Primary key validation before any writes
- Reserved field rejection
- Shard distribution uniformity (17-26 shards/node with 64 shards/3 nodes)
- Quorum calculation: floor(RF/2)+1
- Meilisearch-compatible error shape
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
RRF merge (k=60) benchmarked against ground truth with 10K queries on
skewed 10-shard corpus (93% on shard 1). Result: Kendall τ = 0.1369
(95% CI [0.1339, 0.1399]), far below the 0.95 threshold. 9,998 of 10,000
queries fell below τ=0.95, confirming RRF alone is insufficient for
cross-shard ranking quality with skewed distributions.
DFS preflight (already implemented) achieves τ = 0.9818, passing the
threshold. Add full 10K-query DFS comparison report and fix paths in
experiment.json.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Research complete: both score-based and RRF merge fail 0.95 threshold.
Updated research doc with full RRF validation results and confidence intervals.
Added benchmark result reports and helper tests. Follow-up bead miroir-n6v
created for global-IDF preflight (dfs_query_then_fetch pattern).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wire scatter (fan-out) directly into the RRF merger via scatter_gather_search(),
completing the full read path: plan → scatter → RRF merge. Add RRF simulation
mode to score-comparability benchmark for measuring rank correlation against
global BM25 ground truth.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add NodeClient trait for HTTP calls to Meilisearch nodes (seam between pure miroir-core and networked miroir-proxy)
- Add ScatterPlan struct containing chosen_group, target_shards, shard_to_node mapping, deadline_ms, hedging_eligible
- Implement plan_search_scatter() pure function that constructs the covering set without I/O
- Implement execute_scatter() async function that fans out to nodes with partial-failure handling
- Add MockNodeClient for testing with pre-programmed responses/errors
- Add unit tests for plan construction, query group rotation, shard-to-node mapping, hedging eligibility, and scatter execution
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Research doc updated with precise 95% CIs per query type. compare.py
now computes and reports confidence intervals. Kendall τ = 0.79
(95% CI [0.7873, 0.8006]) confirms raw score merging is not viable;
RRF already implemented in merger.rs as mitigation. Follow-up bead
created (miroir-zfo) for RRF quality validation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Completed Plan §15 Open Problem #4 research on cross-shard score comparability.
## Key Finding
Average Kendall tau: 0.79 vs. 0.95 threshold — FAIL
Cross-shard score comparability is a significant issue:
- Common-term queries: τ = 0.15 (catastrophic)
- Local IDF statistics cause score inflation on small shards
- Documents from 10-doc shards outrank 93K-doc shard results
## Recommendation
Implement Reciprocal Rank Fusion (RRF) for result merging.
Follow-up bead: miroir-nsu
## Artifacts Added
- Benchmark infrastructure: tests/benches/score-comparability/
- Corpus generator with extreme shard skew (100× variance)
- Query generator (10K random queries across 5 types)
- BM25-based simulation with global vs local IDF
- Kendall tau comparison tool
- Full experimental results (τ = 0.79 ± 0.01, 95% CI)
- Research writeup: docs/research/score-normalization-at-scale.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>