jedarden/miroir

Author	SHA1	Message	Date
jedarden	44237eb4e5	P7.5 followup: PII redaction in Debug impls + per-node structured logging in client - Remove raw URI path from middleware span (was leaking index names) - Redact admin_key in AdminLoginRequest Debug impl (session.rs + admin_endpoints.rs) - Redact query/filter fields in SearchRequestBody Debug impl - Add per-node DEBUG structured logging to client.rs (search, write, delete, preflight) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-20 17:04:37 -04:00
jedarden	eb354bc3bb	P7.5: structured JSON logging with request IDs and trace correlation Convert all unstructured format-string logging (tracing::error!("msg: {}", var)) to structured field format (tracing::error!(error = %e, "msg")) across route handlers and key rotation. Strip response text bodies from error messages in scoped key mint/revoke paths to prevent potential PII (key material) from appearing in logs. The core structured JSON logging infrastructure (tracing-subscriber JSON layer, request ID generation via UUIDv7, pod_id from POD_NAME env, telemetry middleware span with request_id/pod_id/method/path) was already in place from prior work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-20 08:28:39 -04:00
jedarden	14852a40ff	P10.7: Admin login rate limiting + exponential backoff - Added record_failure_admin_login to RedisTaskStore for proper consecutive failed attempt tracking - Local rate limiter integration in admin_login flow (backend: local) - record_failure calls on failed login (wrong admin_key) for both backends - Reset on successful login for both backends - Helm schema constraint enforces redis backend when replicas > 1 Acceptance: - 11 login attempts in 60s from same IP → 11th returns 429 - 5 failed attempts → backoff doubles per attempt (10m, 20m, 40m, ...) up to 24h cap - Successful login resets both rate limit counter and backoff state - Multi-pod deployments use shared Redis state for rate limiting Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-20 07:52:06 -04:00
jedarden	ee3ef23133	P10.5: scoped Meilisearch key rotation with multi-pod coordination Implements plan §13.21 leader-based rotation of per-index scoped search keys with zero-403 overlap guarantees: - Leader lease (Redis, Mode B §14.5) serializes rotation across pods - Per-pod beacon with 60s TTL refreshed on every search request - Revocation safety gate: leader checks all live peers observed new generation before DELETE /keys/{previous_uid} - Drain wait (default 120s) for stragglers before revocation - Auto-rotation trigger: scoped_key_rotate_before_expiry_days (30d) before scoped_key_max_age_days (60d) - Manual trigger: POST /_miroir/ui/search/{index}/rotate-scoped-key with force:true to bypass timing gate - Config validation rejects rotate_before >= max_age at startup - Helm _helpers.tpl render-time guard against rotation loop - values.schema.json schema validation for scoped key config fields Also includes session management routes (admin login/logout/session, search UI JWT session) and auth middleware CSRF protection needed by the admin-gated rotation endpoint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-20 07:33:29 -04:00
jedarden	a2a323f33c	P7.5: structured JSON logging with request IDs and trace correlation Enable span context in JSON log output so request_id and pod_id appear on every log line. Downgrade search-handler log to DEBUG to keep INFO volume at ≤1 per request. Fix PII leaks: hash API key identifiers before logging, remove search terms from node error messages. Cast duration_ms from u128 to u64 for clean JSON number serialization. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-20 07:17:14 -04:00
jedarden	43e3367c73	P10.4 followup: log warning on admin session cookie unseal failure Logs a warning with path and error when cookie unseal fails, helping operators diagnose cross-pod ADMIN_SESSION_SEAL_KEY mismatches in HA deployments (acceptance criterion 2). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 17:26:20 -04:00
jedarden	48f7c0aabf	P10.4: ADMIN_SESSION_SEAL_KEY cookie sealing with XChaCha20-Poly1305 Implement admin session cookie sealing per plan §9 and §13.19: - SealKey loaded from ADMIN_SESSION_SEAL_KEY env (base64-encoded 32 bytes), with random fallback and startup warning for multi-pod deployments - Cookie sealed via XChaCha20-Poly1305 AEAD (confidentiality + integrity) - Wire format: base64([24-byte nonce][ciphertext][16-byte tag]) - AuthState initialized with revoked_sessions DashMap + revoked counter - miroir_admin_session_key_generated gauge set at startup (1=random, 0=env) - Revocation cache checked on every cookie-authenticated admin request Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 17:18:39 -04:00
jedarden	6e35e420a9	P10.3: SEARCH_UI_JWT_SECRET dual-secret overlap rotation Implement plan §9 JWT signing-secret rotation with zero-downtime dual-secret overlap window. Primary secret signs new tokens (kid header identifies it), optional previous secret validates old tokens during rotation. Validation tries primary first, falls through to previous on signature mismatch, and propagates Expired immediately when the correct secret is found. Key pieces: - auth.rs: dual-secret JWT validation with kid header, leak response via empty previous, full test coverage (62 tests including e2e rotation scenario) - main.rs: read SEARCH_UI_JWT_SECRET_PREVIOUS, refuse startup without primary - config: jwt_secret_previous_env + jwt_rotation_buffer_s in SearchUiAuthConfig - miroir-ctl: rotate-jwt-secret command (5-step dual-secret overlap procedure) - Helm CronJob: quarterly schedule, suspended by default, Forbid concurrency Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 16:17:33 -04:00
jedarden	26fe2970fc	P10.2: nodeMasterKey zero-downtime rotation flow Add `miroir-ctl key rotate-node-master` command implementing plan §9 4-step zero-downtime rotation: create new admin-scoped key on all Meilisearch nodes, print K8s Secret update instructions, wait for rolling restart confirmation, delete old key. Supports --dry-run, node auto-discovery via topology API, and rollback on step 1 failure. Add `address` field to topology API NodeInfo for CLI node discovery. Add runbooks for both nodeMasterKey (zero-downtime) and startup master key (maintenance window required) rotation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 15:49:40 -04:00
jedarden	3b209e8b66	P10.1: Secret inventory + ESO ExternalSecret wiring Expand eso-external-secret.yaml with full secret inventory (plan §9) — documents all 8 keys with consumer, rotation strategy, and env var mapping. Wire ADMIN_SESSION_SEAL_KEY, SEARCH_UI_JWT_SECRET, SEARCH_UI_JWT_SECRET_PREVIOUS, and SEARCH_UI_SHARED_KEY into the Helm deployment template as optional secretKeyRef env vars. Add startup validation that refuses to start if search_ui is enabled but SEARCH_UI_JWT_SECRET is missing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 15:18:02 -04:00
jedarden	ffe1d63d58	P8: Finalize CI/CD templates, prod ArgoCD app, and CHANGELOG for v0.1.0 - miroir-ci: use cargo fmt --all, add pre-release detection for GitHub releases - miroir-ci-smoke: fix secret ref to github-token - miroir-release: rewrite github-release step with gh CLI, build binaries in release step, add pre-release flag and resource limits - miroir-release-ready: fix serviceAccountName to argo-workflow - miroir-application.yaml: switch prod to Redis backend, 4 Meilisearch replicas - redis.rs: remove unused conn() helper - CHANGELOG: date 0.1.0 release, add missing release/prod entries Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 15:09:14 -04:00
jedarden	dcab90d2c9	Add prod ArgoCD Application manifest for ardenone-cluster Matches the manifest already in declarative-config (commit 3d72934). OCI Helm chart at ghcr.io/jedarden/charts/miroir, automated sync with prune + selfHeal + ServerSideApply. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 14:36:31 -04:00
jedarden	42905066cf	P8: Fix miroir-release Kaniko build — use shell script instead of sprig expressions Replace sprig regex template expressions with a shell script approach for Kaniko destination tags, matching the pattern in miroir-ci.yaml. Pin Kaniko image to v1.23.0-debug. Fix serviceAccountName from argo-runner to argo-workflow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 14:33:51 -04:00
jedarden	3e1451dff1	Multi-stage Dockerfile with musl cross-compilation and .dockerignore Builder stage compiles both miroir-proxy and miroir-ctl as static musl binaries, strips them, and copies into a scratch image. Updated .cargo/config.toml to use target-feature=+crt-static instead of incorrect CC/CFLAGS. Added .dockerignore to exclude non-essential files. Image: 4.0 MB compressed (scratch base, single static binary). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 13:47:45 -04:00
jedarden	700bce2bd6	Add Dockerfile for scratch-based miroir-proxy image with musl static binary FROM scratch image copies stripped static musl binary (4 MB compressed). Updated .cargo/config.toml with proper musl cross-compilation settings. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 13:31:37 -04:00
jedarden	9b2f11f71b	P8: Sync CI/CD templates and ArgoCD Application to miroir repo (plan §6/§7) Adds miroir-ci WorkflowTemplate (checkout → lint → test → musl build → Kaniko push + GitHub release, tag-gated), miroir-ci-smoke quick lint+test template, and miroir-dev ArgoCD Application reference. Updates CHANGELOG.md with Phase 8 deployment entries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 13:28:07 -04:00
jedarden	f415a10a85	P8: Add optional OpenTelemetry tracing deps, fix subscriber init, clean up .gitignore - Add `tracing` feature flag with optional OTel deps to miroir-proxy - Fix tracing subscriber initialization (use .init() instead of set_global_default) - Add pod_id as global span field for structured logging - Improve DF lookup error messages in preflight handler - Add build artifacts to .gitignore Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 13:24:24 -04:00
jedarden	a7540ab060	P7.3: Add §13.1 resharding row to Grafana dashboard, fix y-coordinate overlaps Add collapsed Resharding (§13.1) feature-gated row with phase gauge, in-progress stat, and backfill rate panel. Fix overlapping y=74 on Anti-Entropy and Settings Broadcast rows by shifting subsequent rows. Sync charts/miroir/dashboards/ copy with root dashboard. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 13:18:13 -04:00
jedarden	21748edf5e	P8.7: Conditional Helm templates for CDC PVC, Redis, and ESO integration (plan §6/§9/§13.13) - PVC template conditional on cdc.buffer.primary=="pvc" or cdc.buffer.overflow=="pvc" - Redis deployment conditional on redis.enabled with auth via auto-generated or ESO secret - ESO ExternalSecret example pulling from kv/search/miroir via openbao-backend ClusterSecretStore - Deployment mounts CDC PVC at /data/cdc and injects Redis password when enabled - ConfigMap generates taskStore.url and cdc.buffer.pvc_path from helpers Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 13:16:14 -04:00
jedarden	863bf1c33f	P8.3: Refine schema rejections and add test runner Simplify values.schema.json if/then patterns for rules 3-4 (removed verbose allOf in favor of direct enum constraint in then branch), drop unsupported errorMessage fields, and add run-tests.sh for automated CI validation of all 12 schema/template test cases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 13:10:58 -04:00
jedarden	c86f50fd76	P7.3: Add Grafana dashboard with 8 core panels and feature-gated rows (plan §10) dashboards/miroir-overview.json — 50-panel dashboard covering: - Core: cluster health, request rate, p50/p95/p99 latency, node comparison, search overhead, task lag, shard distribution, rebalance activity - Feature-gated collapsed rows: multi-search (§13.11), anti-entropy (§13.8), settings broadcast (§13.5), CDC (§13.13), canary tests (§13.18), search UI (§13.21) Helm chart: dashboards.enabled creates a ConfigMap labeled grafana_dashboard=1 for sidecar auto-import. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 13:02:16 -04:00
jedarden	5b9ae4fa02	P8.3: Add values.schema.json rejection rules for incompatible configs Schema-enforced rules (helm lint --strict): - Rule 1: miroir.replicas > 1 requires taskStore.backend=redis - Rule 2: hpa.enabled requires replicas >= 2 AND taskStore.backend=redis - Rule 3: search_ui.rate_limit.backend=local rejected when replicas > 1 - Rule 4: admin_ui.login_rate_limit.backend=local rejected when replicas > 1 Template-enforced rule (helm template): - Rule 5: scoped_key_rotate_before_expiry_days < scoped_key_max_age_days (JSON Schema draft-7 cannot compare sibling properties) 11 test cases: 7 bad configs rejected, 4 good configs pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 12:53:37 -04:00
jedarden	7c13091a27	P7.2: Wire §13.11-21 metric families behind feature flags (plan §10) Register 42 advanced-capabilities metrics gated by config.*.enabled flags. Each metric family is Option<T> — None when disabled, registered only when the corresponding feature flag is on. Includes accessor methods (no-op when disabled), clone support, and three test scenarios: all-on, all-off, and noop accessors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 12:49:20 -04:00
jedarden	c8d5672d78	P8.2: Scaffold Helm chart with dev defaults (plan §6) Full chart structure with 14 templates, values.schema.json, and NOTES.txt. Dev defaults: 1 replica, 64 shards, RF=1, RG=1, sqlite task store, HPA off. Production upgrade path documented in NOTES.txt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 12:31:36 -04:00
jedarden	ea6be6a339	P7.4: Add ServiceMonitor and PrometheusRule manifests (plan §10 + §14.9) ServiceMonitor scrapes the metrics port (9090) at 30s intervals. PrometheusRule ships all 12 alerts: 7 availability (degraded shards, node down, high latency, stuck tasks, stuck rebalance, settings divergence, anti-entropy mismatch) + 5 resource pressure (memory, request queue, background queue, peer discovery, no leader). Both gated behind serviceMonitor.enabled / prometheusRule.enabled (defaults: false — requires prometheus-operator in cluster). Also adds metrics port to the miroir Service so ServiceMonitor can select it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 11:42:35 -04:00
jedarden	13d4430d2a	P7.1: Register all 18 plan §10 core metric families Register Requests, Node health, Shards, Tasks, Scatter-gather, and Rebalancer metrics on :9090/metrics (pod-internal scrape) and /_miroir/metrics (admin-key gated). Node/shard metrics use GaugeVec/ CounterVec with bounded-cardinality labels (node_id, operation, error_type). Search handler records scatter_fan_out_size and partial_responses. All 111 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 11:35:56 -04:00
jedarden	69e33a6744	P7.6: Implement OpenTelemetry tracing (disabled by default) Add OTel distributed tracing support with zero overhead when disabled. Configuration (plan §10): - tracing.enabled: false (default, zero overhead) - tracing.endpoint: "http://tempo.monitoring.svc:4317" - tracing.service_name: "miroir" - tracing.sample_rate: 0.1 (head-based sampling) Span hierarchy: - Parent: inbound request (POST /indexes/:index/search) - Child: scatter plan construction - Parallel children: one per node in covering set - Child: merge operation Resource attributes: service.name, service.version, host.name When disabled (tracing.enabled: false), no OTel library calls are made. Shutdown handler flushes pending traces before exit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 10:15:39 -04:00
jedarden	2dcfae8822	P8.6: Release mechanics — bump script, release-ready check, PR template, Argo CIs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:54:26 -04:00
jedarden	c5d61b6d17	P8.1: Add scratch-based Dockerfile with OCI labels - Uses FROM scratch for minimal image size (14.2 MB) - Includes OCI labels: source, version, revision, licenses - Exposes ports 7700 (main) and 9090 (metrics) - Static musl binary for zero libc dependency Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 09:44:40 -04:00
jedarden	7a8742375b	P2.6: Complete Phase 2 DoD — dedup, live topology, field stripping, all 14 tests pass - merger: deduplicate hits by primary key when multiple shards map to same node - search: use shared AppState with live topology from health checker - search: strip _miroir_shard always, _rankingScore only when not requested - search: include facetDistribution only when facets were requested - credentials: add mutex guards for env-var test isolation - Add Phase 2 DoD integration tests: shard coverage, dedup, facets, paging, degraded writes, error shape parity, topology shape, auth errors, reserved fields Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 09:29:43 -04:00
jedarden	d171dfb26a	P12.OP4.1: Complete global-IDF preflight (dfs_query_then_fetch pattern) Implementation complete with validation passing all acceptance criteria: - Preflight phase: execute_preflight() gathers term frequencies from all shards - Global IDF aggregation: GlobalIdf::from_preflight_responses() computes corpus-wide statistics - DFS search: dfs_query_then_fetch_search() orchestrates the full pattern - Score merge: ScoreMergeStrategy merges by globally-comparable scores Benchmark validation (10K queries, 100K docs, 10 shards with skewed distribution): - Average Kendall tau: 0.9817 (PASS ≥ 0.95 threshold) - Min tau: 0.9523 (above threshold) - Queries with τ < 0.95: 0 (0%) - All query types pass (common, single, filtered, rare, multi-term) Latency overhead: +1-2 round trips (parallelized across shards), sub-microsecond coordinator-side aggregation per Criterion benchmarks. Closes miroir-n6v Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 07:56:22 -04:00
jedarden	8498d85e58	P2.4: Fix build and test for index lifecycle endpoints Fix middleware module export from lib.rs so the crate compiles as a library. Remove unused settings mock assertions from test_create_index_broadcasts_to_all_nodes (the settings injection flow is already covered by test_miroir_shard_in_filterable_attributes). All 11 acceptance tests pass: - POST /indexes broadcasts to all nodes with rollback on failure - _miroir_shard in filterableAttributes after creation - GET /indexes/{uid}/stats logical doc count (divided by RG*RF) - Settings broadcast sequential with rollback - DELETE /indexes broadcasts to all nodes - PATCH /indexes/{uid} snapshot and rollback - /keys CRUD broadcasts with all-or-nothing semantics Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 07:49:46 -04:00
jedarden	aa1982006e	P2.5: Implement task ID reconciliation and /tasks endpoints Implements plan §3 "Task ID reconciliation": - Every write fan-out collects per-node taskUid values - Generate Miroir task ID mtask-<uuid> - Persist mtask → {node_id: node_task_uid} in in-memory task registry - Return mtask-xxxxx to client as {"taskUid": ...} in Meilisearch shape - GET /tasks/{mtask_id} polls every mapped node task, aggregates status - succeeded: all nodes report succeeded - failed: any node reports failed; includes per-node error detail - processing: otherwise - GET /tasks with Meilisearch-compatible filters (statuses, types, indexUids, from, limit) - DELETE /tasks/{mtask_id} for best-effort cancellation Details: - Polling cadence: exponential backoff (25ms → 50 → 100 → ... → 1s cap) - In-memory registry using Arc<RwLock<HashMap<String, MiroirTask>>> - NodeClient trait extended with get_task_status method - TaskStatusResponse with to_node_status() conversion - Background polling spawned per task with tokio::spawn Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 07:46:49 -04:00
jedarden	b23e70656e	P2.2: Implement write path with primary key validation, shard injection, and two-rule quorum Implements POST/PUT /indexes/{uid}/documents and DELETE /indexes/{uid}/documents: - Primary key extraction on hot path with 400 miroir_primary_key_required if missing - _miroir_shard injection into every document before forwarding to nodes - Rejection of _miroir_shard in client-submitted docs (400 miroir_reserved_field) - Two-rule quorum: per-group floor(RF/2)+1 ACKs, success if ≥1 group meets quorum - X-Miroir-Degraded header when any group misses quorum - 503 miroir_no_quorum only when NO group meets quorum - Per-batch grouping by target shard for efficient HTTP fan-out - DELETE by IDs routes each ID independently to its shard - DELETE by filter broadcasts to all nodes Acceptance tests pass: - Primary key validation before any writes - Reserved field rejection - Shard distribution uniformity (17-26 shards/node with 64 shards/3 nodes) - Quorum calculation: floor(RF/2)+1 - Meilisearch-compatible error shape Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 06:48:30 -04:00
jedarden	8e46312df2	P2.3: Clean up unused import in acceptance test - Remove unused ShardHitPage import from p23_search_read_path.rs - All 10 acceptance tests pass: - Unique-keyword search returns exactly 1 hit (RRF deduplication) - Facet counts sum correctly across shards - Paging with no dupes/gaps (5 pages of 10 = 50 unique results) - Node down with RF=2: search still covers all shards - Group down with fallback: uses other group, not degraded - X-Miroir-Degraded header includes actual shard IDs Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 06:44:39 -04:00
jedarden	ebc300355c	P2.3: Implement scatter-gather search with group fallback Implement the search read path with scatter-gather + merge + group selection: 1. Group-unavailability fallback: When a shard has no available replica in the primary group, the Fallback policy tries other replica groups before failing. This provides full results (not degraded) when an alternate group is healthy. 2. X-Miroir-Degraded header: Now includes actual shard IDs in the format "X-Miroir-Degraded: shards=3,7,11" instead of just "partial". 3. Acceptance tests for P2.3: - Unique-keyword search deduplicates correctly (RRF) - Facet counts sum across shards - Paging with no dupes/gaps - Node down with RF=2 still covers all shards - Group down falls back to other group (not degraded) - Degraded header includes actual shard IDs Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 06:40:04 -04:00
jedarden	1b9dc1d8c3	P2.1: Implement axum server skeleton with health/version/ready/topology/shards/metrics endpoints - Load Config (file + env + CLI args overlay) via MiroirConfig::load() - Initialize tracing with JSON-to-stdout format (plan §10) - Start two axum listeners: :7700 (client API) + :9090 (metrics, unauthenticated) - Signal handlers for graceful shutdown (SIGTERM → drain → exit) - GET /health returns {"status":"available"} immediately (Meilisearch-compatible) - GET /version returns Meilisearch version from healthy node (60s TTL cache) - GET /_miroir/ready returns 503 during startup, 200 once covering quorum reachable - GET /_miroir/topology returns cluster state per plan §10 JSON shape - GET /_miroir/shards returns shard → node mapping table - GET /_miroir/metrics returns admin-key-gated Prometheus metrics - Background health checker promotes nodes to Active when reachable - UnifiedState bundles AuthState, Metrics, and admin_endpoints::AppState Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 06:12:05 -04:00
jedarden	1d486553a6	Fix /_miroir/metrics to require admin key (not exempt) Per plan §10, GET /_miroir/metrics is admin-key-gated so it can be exposed outside the cluster. It was incorrectly marked as dispatch-exempt with comment "admin-key-optional" - changed to require admin authentication. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:57:31 -04:00
jedarden	57e6239d7e	P2.1: Implement axum server skeleton with health/version/ready/topology/shards/metrics endpoints Implemented the minimum-viable endpoints needed for Kubernetes probes and operator inspection: - Config loading: file → env → CLI overlay with validation - JSON structured logging to stdout (plan §10 format) - Two axum listeners: :7700 (client API) + :9090 (metrics, unauthenticated) - Signal handlers for graceful shutdown (SIGTERM drains in-flight requests) Endpoints implemented: - GET /health - Meilisearch-compatible liveness probe (200, no auth, returns {"status":"available"}) - GET /version - Returns Meilisearch version from any healthy node (60s TTL cache) - GET /_miroir/ready - Readiness probe (503 until covering quorum reachable) - GET /_miroir/topology - Full cluster state per plan §10 JSON shape - GET /_miroir/shards - Shard → node mapping table - GET /_miroir/metrics - Admin-key-gated Prometheus metrics mirror Acceptance criteria verified: - curl localhost:7700/health returns 200 within 100ms of process start ✓ - curl localhost:7700/_miroir/ready returns 503 until all nodes reachable ✓ - curl -H "Authorization: Bearer $ADMIN_KEY" localhost:7700/_miroir/topology matches plan §10 shape ✓ - SIGTERM drains in-flight requests ✓ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:52:21 -04:00
jedarden	affb59fff6	P12.OP4: Validate RRF merge quality — τ=0.14 confirms DFS preflight is required RRF merge (k=60) benchmarked against ground truth with 10K queries on skewed 10-shard corpus (93% on shard 1). Result: Kendall τ = 0.1369 (95% CI [0.1339, 0.1399]), far below the 0.95 threshold. 9,998 of 10,000 queries fell below τ=0.95, confirming RRF alone is insufficient for cross-shard ranking quality with skewed distributions. DFS preflight (already implemented) achieves τ = 0.9818, passing the threshold. Add full 10K-query DFS comparison report and fix paths in experiment.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:43:42 -04:00
jedarden	c7be4ccbec	P12.OP4.1: Validate dfs_query_then_fetch benchmark (τ=0.9817) and document latency Re-ran the 10K-query score-comparability benchmark with fresh results: - DFS (global IDF preflight): avg τ = 0.9817, min τ = 0.9523, 0 queries below 0.95 → PASS - Score merge (local IDF): avg τ = 0.7938, 62.9% queries below 0.95 → FAIL - RRF merge: avg τ = 0.1361, 100% queries below 0.95 → CATASTROPHIC Added Criterion latency benchmarks to the research doc: - Global IDF aggregation: 285ns (3 shards) → 3.31µs (50 shards) - Query term extraction: 69ns (1 word) → 726ns (9 words) - IDF computation: ~113ps per term (trivial) - Coordinator-side overhead is sub-microsecond; dominant cost is network round-trip Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:31:13 -04:00
jedarden	fca081e1bd	Integrate MeilisearchError into proxy (IntoResponse, auth middleware) + telemetry - Add axum feature flag to miroir-core with IntoResponse impl for MeilisearchError - Refactor auth middleware to use MeilisearchError::new() + MiroirCode instead of manual JSON construction, ensuring consistent error shape across all auth errors - Add proxy error.rs re-export alias for ApiError - Implement full telemetry middleware with Prometheus metrics (request duration, in-flight gauge, scatter counters, node health) - Reorder middleware layers: auth before telemetry so 401s are also instrumented Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:21:09 -04:00
jedarden	625e414b6c	Implement bearer-token dispatch chain (plan §5 rules 0-5) + X-Admin-Key Add deterministic bearer-token dispatch with five rules: - Rule 0: dispatch-exempt endpoints skip all auth (metrics, locale, login, session, SPA) - Rule 1: JWT-shape probe stub (Phase 5 will add full validation) - Rule 2: admin-path (/__miroir/*) matches only admin_key - Rule 3: non-admin paths match only master_key - Rule 4: mismatch returns 401 miroir_invalid_auth Also adds X-Admin-Key header short-circuit for admin endpoints, constant-time comparison via subtle::ConstantTimeEq, rate-limit hook types (Phase 2 in-memory stub), and 54 unit tests covering all acceptance criteria. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:11:57 -04:00
jedarden	9606af8159	Add Meilisearch-compatible error shape and miroir_* error codes (P2.6) Implement the API error response format from plan §5: - ErrorType enum: invalid_request, auth, internal, system - MiroirCode enum with all 10 miroir_* codes and their HTTP status mappings - MeilisearchError struct with Meilisearch-compatible JSON shape - Forwarding support for Meilisearch-native node errors (verbatim passthrough) - Doc links pointing to docs/errors.md#<code> - 21 unit tests covering every code's JSON shape, HTTP status, and forwarding Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 05:05:32 -04:00
jedarden	de1f37c8b3	Fix clippy warnings, improve test robustness, and clean up proxy code - task_pruner: use poison-aware lock recovery (unwrap_or_else) for GAUGE_LOCK - task_pruner: add spawn_pruner lifecycle tests (run+stop, drop+stop) - proxy/client: remove unused timeout_ms field, suppress dead_code on preflight_url - proxy/search: fix serde rename for rankingScore field - proxy/indexes: fix clippy unnecessary_lazy_evaluations warning Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 04:53:45 -04:00
jedarden	17d02b97f8	Close bead miroir-cdo: Phase 1 Core Routing complete All DoD criteria verified: 233 tests pass (197 unit + 14 cutover + 10 DFS + 12 proptest), 92.72% line coverage (excl benchmarks). Router 100%, topology 100%, scatter 90.2%, merger 94.7%. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 04:24:19 -04:00
jedarden	483f821dc1	Close bead miroir-cdo: Phase 1 Core Routing complete Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 04:07:33 -04:00
jedarden	068cb5a77f	Phase 1 Core Routing: verify DoD complete, update tracking files All Phase 1 DoD criteria verified: - Rendezvous assignment deterministic (router.rs 100% coverage) - Reshuffle bound on add ≤ 2×(1/4) (proptest + unit test) - 64 shards/3 nodes/RF=1 → 17-26 per node (uniformity test) - write_targets returns RG×RF nodes (acceptance tests) - covering_set with replica rotation (acceptance tests) - merger passes all merge/facet/limit tests - miroir-core ≥ 90% line coverage (90.17% via tarpaulin) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 04:06:34 -04:00
jedarden	da2aa18e04	Fix imports in dfs_skewed_corpus integration test Add missing imports for Node and NodeId types to fix compilation error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 03:51:15 -04:00
jedarden	096b43ccab	P12.OP4: Implement dfs_query_then_fetch for cross-shard comparability Implements the Elasticsearch dfs_query_then_fetch pattern as a pre-query phase in Miroir to resolve cross-shard score comparability issues caused by differing local IDF values across shards with skewed document distributions. Core changes: - scatter.rs: New PreflightRequest/PreflightResponse types, GlobalIdf aggregation, execute_preflight and dfs_query_then_fetch_search functions - Proxy client: preflight_node implementation for term-frequency gathering - Search routes: Integration of DFS preflight before main search phase - Integration test: dfs_skewed_corpus.rs with 10 tests covering aggregation and serialization - Benchmark: dfs_preflight_bench.rs measuring preflight overhead Validation results (1,443 queries, 10-shard skewed corpus): - Average Kendall tau: 0.9815 (95% CI: [0.9809, 0.9821]) - Min tau: 0.9523 (zero queries below 0.95 threshold) - Per-type: common-term +0.84, single-term +0.11, filtered +0.11 The preflight phase adds one network round-trip before the search phase, with requests parallelized across shards. Estimated overhead: +1-2 RTTs. Resolves bead miroir-yio: Global-IDF preflight implementation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-19 03:43:10 -04:00

1 2

91 commits