diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index 7185e1b..233fad3 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -52,7 +52,7 @@ {"id":"miroir-mkk.4","title":"P4.4 Replica group addition: initializing → active","description":"## What\n\nImplement the \"Adding a new replica group\" flow from plan §2:\n1. Provision new nodes; assign `replica_group: G_new` in config\n2. Mark new group `initializing`; queries NOT routed here\n3. Background sync: for each shard, copy all docs from **any** healthy existing group to the new group's nodes via `filter=_miroir_shard={id}` pagination; new inbound writes already fan out to the new group immediately\n4. When all shards synced, mark group `active` — queries begin routing in round-robin\n5. Existing groups continue serving queries throughout (zero read interruption)\n\n## Why\n\nPlan §2 \"Adding a new replica group (throughput scaling)\": adding a group multiplies query capacity without touching existing groups' data. This is the primary \"we need more search QPS\" lever. Unlike intra-group rebalance which moves a subset, group-add **copies** every shard to the new group — so the I/O is proportional to total corpus size, not `1/(Ng+1)`.\n\n## Details\n\n**Source group selection**: round-robin across existing `active` groups to spread read load during sync. Per-shard picks a different source so one group isn't hammered.\n\n**Write fan-out during sync**: new group already receives writes from step 3 onward. This is the durability guarantee — only the backfill window of historical data is transient.\n\n**Progress tracking**: per-shard cursor in `jobs` table; can be paused/resumed per Phase 6 Mode C.\n\n**Verification before `active`**: `GET /indexes/{uid}/stats` against new group → docs count within 0.1% of source group (allows for writes landing during sync). If higher variance, delay the flip and investigate.\n\n## Acceptance\n\n- [ ] Integration test: RG=1 → RG=2; during sync, query throughput on original group unchanged (no regression)\n- [ ] After `active`, queries distribute round-robin between the two groups (verified via per-group metrics)\n- [ ] Mid-sync write test: 100 writes landing during the backfill window are all present on both groups when sync completes\n- [ ] Failed sync (source group becomes unavailable mid-copy) pauses without corrupting new group; resumes when source returns","status":"open","priority":0,"issue_type":"task","created_at":"2026-04-18T21:31:43.859158013Z","created_by":"coding","updated_at":"2026-04-18T21:31:48.961616587Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["phase-4"],"dependencies":[{"issue_id":"miroir-mkk.4","depends_on_id":"miroir-mkk","type":"parent-child","created_at":"2026-04-18T21:31:43.859158013Z","created_by":"coding","metadata":"{}","thread_id":""},{"issue_id":"miroir-mkk.4","depends_on_id":"miroir-mkk.1","type":"blocks","created_at":"2026-04-18T21:31:48.961576914Z","created_by":"coding","metadata":"{}","thread_id":""}]} {"id":"miroir-mkk.5","title":"P4.5 Group removal + unplanned node failure","description":"## What\n\nTwo related flows from plan §2:\n\n**Removing a replica group** (decommission a query pool):\n1. Mark group `draining` — queries stop routing immediately\n2. Nodes can be decommissioned; no data migration needed (other groups hold the docs)\n3. Remove nodes from config; operator deletes pods + PVCs\n\n**Unplanned node failure**:\n1. Health check detects failure → mark `failed`, stop routing writes to it\n2. If RF > 1 within the group: surviving replicas serve reads — no immediate migration\n3. For reads: if failed node's shards have no intra-group RF replica, fall back to a healthy group for those shards\n4. Schedule background replication to restore RF within the group; degrade to cross-group fallback until restored\n\n## Why\n\nPlan §2: \"Changes to one group do not affect other groups' data or query routing.\" Group-removal is instant (no data movement) — lets operators shed throughput capacity without a migration window. Unplanned node failure is the most time-sensitive case: readers must not see errors; RF-restore runs in the background.\n\n## Details\n\n**Group-removal preconditions**: refuse to remove a group if it's the last group holding a shard (would be data loss). Require `--force` and document the risk.\n\n**Failure detection**: plan §4 config:\n```yaml\nhealth:\n interval_ms: 5000\n timeout_ms: 2000\n unhealthy_threshold: 3 # 3 consecutive failures → mark degraded\n recovery_threshold: 2 # 2 consecutive OKs → mark healthy again\n```\n\n**Cross-group fallback**: Phase 1 `covering_set` already deterministic per-request; the fallback is a per-shard \"if intra-group has none, check other groups\" decision **inside** the scatter planner (Phase 2).\n\n**RF-restore**: similar to P4.2 node addition but for an existing node that lost its data — re-run `_miroir_shard` filter migration from the best intra-group source.\n\n## Acceptance\n\n- [ ] Remove a group with healthy peer groups → queries route away within one `query_seq` tick; no read errors\n- [ ] `--force`-remove the last group holding shard S → loud warning; operator must re-type the index UID to confirm\n- [ ] RF=2 group with 1 node killed → reads succeed on remaining replica; `X-Miroir-Degraded` absent\n- [ ] RF=1 group with 1 node killed → cross-group fallback kicks in; `X-Miroir-Degraded` absent if fallback succeeds\n- [ ] Restored node re-hydrates from a peer replica within its group; `miroir_rebalance_in_progress` transitions 0→1→0","status":"open","priority":0,"issue_type":"task","created_at":"2026-04-18T21:31:43.887649468Z","created_by":"coding","updated_at":"2026-04-18T21:31:48.981354074Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["phase-4"],"dependencies":[{"issue_id":"miroir-mkk.5","depends_on_id":"miroir-mkk","type":"parent-child","created_at":"2026-04-18T21:31:43.887649468Z","created_by":"coding","metadata":"{}","thread_id":""},{"issue_id":"miroir-mkk.5","depends_on_id":"miroir-mkk.1","type":"blocks","created_at":"2026-04-18T21:31:48.981335608Z","created_by":"coding","metadata":"{}","thread_id":""}]} {"id":"miroir-mkk.6","title":"P4.6 Admin API for topology ops: /_miroir/nodes + /_miroir/rebalance","description":"## What\n\nPlan §4 admin API endpoints for topology (wrap the rebalancer flows):\n- `POST /_miroir/nodes` — add node (P4.2)\n- `DELETE /_miroir/nodes/{id}` — drain + remove\n- `POST /_miroir/nodes/{id}/drain` — drain only (P4.3, plan §6 \"Scaling\" scale-down)\n- `POST /_miroir/rebalance` — manually trigger rebalance (e.g., after config-only topology tweak)\n- `GET /_miroir/rebalance/status` — current progress; returned shape includes per-shard phase + `miroir_task_id` for each migration batch\n\n## Why\n\nThese endpoints are the **operator surface**. Everything in §11 \"Common operations with miroir-ctl\" maps to these; the Admin UI §13.19 topology tab is a visual wrapper around the same endpoints. Keeping them REST-shaped rather than ad-hoc makes `miroir-ctl` a thin wrapper and the Admin UI trivial.\n\n## Details\n\n**Body shape for `POST /_miroir/nodes`**:\n```json\n{\n \"id\": \"meili-4\",\n \"address\": \"http://meili-4.search.svc:7700\",\n \"replica_group\": 0\n}\n```\n\n**Response**: `202 Accepted` with a `miroir_task_id` (the rebalance is async). Client polls `/tasks/{mtask}` for terminal status.\n\n**`GET /_miroir/rebalance/status`** returns:\n```json\n{\n \"in_progress\": true,\n \"triggered_by\": \"POST /_miroir/nodes\",\n \"operation_id\": \"reb-1234\",\n \"started_at\": \"2026-04-18T20:00:00Z\",\n \"phases\": [\n {\"shard\": 12, \"state\": \"MigrationInProgress\", \"pct_complete\": 42, \"source\": \"meili-0\", \"destination\": \"meili-4\"},\n ...\n ],\n \"overall_pct_complete\": 38\n}\n```\n\n**Authentication**: admin-key only (plan §5 bearer dispatch rule 2).\n\n## Acceptance\n\n- [ ] `curl -X POST -H \"Authorization: Bearer $ADMIN_KEY\" .../_miroir/nodes -d '{\"id\":\"meili-4\",\"address\":\"http://...\",\"replica_group\":0}'` returns 202 + miroir_task_id\n- [ ] Invalid `replica_group` (not present in current topology) → 400 with clear message\n- [ ] `POST /_miroir/rebalance` without prior topology change returns 200 and a no-op task (already balanced)\n- [ ] `GET .../rebalance/status` during a rebalance reflects per-shard state in near real time (< 5s staleness)","status":"open","priority":1,"issue_type":"task","created_at":"2026-04-18T21:31:43.916640224Z","created_by":"coding","updated_at":"2026-04-18T21:31:49.023343521Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["phase-4"],"dependencies":[{"issue_id":"miroir-mkk.6","depends_on_id":"miroir-mkk","type":"parent-child","created_at":"2026-04-18T21:31:43.916640224Z","created_by":"coding","metadata":"{}","thread_id":""},{"issue_id":"miroir-mkk.6","depends_on_id":"miroir-mkk.2","type":"blocks","created_at":"2026-04-18T21:31:48.997646112Z","created_by":"coding","metadata":"{}","thread_id":""},{"issue_id":"miroir-mkk.6","depends_on_id":"miroir-mkk.3","type":"blocks","created_at":"2026-04-18T21:31:49.023268953Z","created_by":"coding","metadata":"{}","thread_id":""}]} -{"id":"miroir-n6v","title":"P12.OP4.1: Global-IDF preflight (dfs_query_then_fetch pattern)","description":"## What\n\nImplement global-IDF preflight query phase for Miroir to solve cross-shard score comparability (Plan §15 OP#4).\n\nResearch validation (bead miroir-zc2.4) confirmed:\n- Score-based merge: Kendall τ = 0.79 vs ground truth (FAIL, threshold 0.95)\n- RRF merge: Kendall τ = 0.14 vs ground truth (CATASTROPHIC)\n- Root cause: local IDF computed per-shard diverges from global IDF on skewed shard distributions\n\n## Approach\n\nElasticsearch `dfs_query_then_fetch` pattern:\n1. Preflight round: scatter term-frequency query to all shards\n2. Aggregate global document frequencies at coordinator\n3. Send global IDF with search query to shards\n4. Shards use global IDF for scoring instead of local\n\n## Acceptance\n\n- [ ] Preflight round implemented in scatter-gather pipeline\n- [ ] Global IDF aggregation at coordinator\n- [ ] Shards accept and use global IDF for scoring\n- [ ] Re-run benchmark: Kendall τ ≥ 0.95 with same skewed corpus\n- [ ] Latency overhead measured and documented\n\n## Reference\n\n- Research doc: docs/research/score-normalization-at-scale.md\n- Benchmark: tests/benches/score-comparability/\n- ES reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch","status":"in_progress","priority":2,"issue_type":"feature","assignee":"bravo","created_at":"2026-04-19T06:31:33.844052667Z","created_by":"coding","updated_at":"2026-04-19T06:34:24.332097977Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["miroir","research","score-normalization"],"dependencies":[{"issue_id":"miroir-n6v","depends_on_id":"miroir-zc2.4","type":"related","created_at":"2026-04-19T06:32:11.786005093Z","created_by":"coding","metadata":"{}","thread_id":""}]} +{"id":"miroir-n6v","title":"P12.OP4.1: Global-IDF preflight (dfs_query_then_fetch pattern)","description":"## What\n\nImplement global-IDF preflight query phase for Miroir to solve cross-shard score comparability (Plan §15 OP#4).\n\nResearch validation (bead miroir-zc2.4) confirmed:\n- Score-based merge: Kendall τ = 0.79 vs ground truth (FAIL, threshold 0.95)\n- RRF merge: Kendall τ = 0.14 vs ground truth (CATASTROPHIC)\n- Root cause: local IDF computed per-shard diverges from global IDF on skewed shard distributions\n\n## Approach\n\nElasticsearch `dfs_query_then_fetch` pattern:\n1. Preflight round: scatter term-frequency query to all shards\n2. Aggregate global document frequencies at coordinator\n3. Send global IDF with search query to shards\n4. Shards use global IDF for scoring instead of local\n\n## Acceptance\n\n- [ ] Preflight round implemented in scatter-gather pipeline\n- [ ] Global IDF aggregation at coordinator\n- [ ] Shards accept and use global IDF for scoring\n- [ ] Re-run benchmark: Kendall τ ≥ 0.95 with same skewed corpus\n- [ ] Latency overhead measured and documented\n\n## Reference\n\n- Research doc: docs/research/score-normalization-at-scale.md\n- Benchmark: tests/benches/score-comparability/\n- ES reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch","status":"in_progress","priority":2,"issue_type":"feature","assignee":"bravo","created_at":"2026-04-19T06:31:33.844052667Z","created_by":"coding","updated_at":"2026-04-19T06:57:46.854857854Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["deferred","miroir","research","score-normalization"],"dependencies":[{"issue_id":"miroir-n6v","depends_on_id":"miroir-zc2.4","type":"related","created_at":"2026-04-19T06:32:11.786005093Z","created_by":"coding","metadata":"{}","thread_id":""}]} {"id":"miroir-nsu","title":"RRF Merging Implementation","description":"## Genesis Bead\nTied to plan: /home/coding/miroir/docs/plan/plan.md\n\n## Overview\nImplement Reciprocal Rank Fusion (RRF) for result merging in Miroir to address cross-shard score comparability issues identified in score-normalization-at-scale research.\n\n## Research Context\nExperiments (miroir-zc2.4) showed:\n- Average Kendall tau: 0.79 vs. 0.95 threshold (FAIL)\n- Common-term queries: τ = 0.15 (catastrophic)\n- RRF is the recommended solution (no preflight, production-proven)\n\n## Progress\n- [ ] Phase 1: Update Merger trait and stub\n- [ ] Phase 2: Implement RRF scoring\n- [ ] Phase 3: Benchmark against corpus\n- [ ] Phase 4: Integration with scatter-gather","status":"closed","priority":2,"issue_type":"genesis","assignee":"charlie","created_at":"2026-04-19T03:56:08.747340056Z","created_by":"coding","updated_at":"2026-04-19T06:24:21.290715173Z","closed_at":"2026-04-19T06:24:21.290611796Z","close_reason":"All four phases complete: MergeStrategy trait, RRF scoring (k=60), benchmarks re-run, scatter-gather integration. 26 merger + 15 scatter tests passing. Commits: 2b7f4a0, f5a630d, cec3b81","source_repo":".","compaction_level":0,"original_size":0,"labels":["deferred","failure-count:1"]} {"id":"miroir-qjt","title":"Phase 8 — Deployment + CI (§6, §7)","description":"## Phase 8 Epic — Deployment + CI\n\nPackages Miroir: static musl binary → scratch Docker image → Helm chart → ArgoCD Application → Argo Workflows CI template (iad-ci). At phase end, `git tag v0.1.0 && git push origin v0.1.0` produces a signed GitHub Release with both `miroir-proxy` and `miroir-ctl`, a ghcr.io image, and a chart version bump.\n\n## Why This Phase (and Why It Depends On Phase 2)\n\nPlan §6 (Deployment) + §7 (CI/CD) turn the binary into a thing operators can actually install. Helm defaults (plan §6 \"Dev vs. production defaults\") encode the \"single-pod dev, multi-pod prod\" story from Phase 6. ArgoCD app + Argo Workflow template live in `jedarden/declarative-config` (see `/home/coding/CLAUDE.md`) — standard pattern across the fleet.\n\n## Scope\n\n**Dockerfile** (plan §7)\n- `FROM scratch` + static `miroir-proxy` binary\n- Expose 7700 + 9090\n- OCI labels: source, version, revision, licenses=MIT\n- Target size < 15 MB compressed\n\n**Cargo musl build** — `x86_64-unknown-linux-musl` target; `cargo build --release` for both `-p miroir-proxy` and `-p miroir-ctl`\n\n**Argo WorkflowTemplate `miroir-ci`** (plan §7) at `jedarden/declarative-config → k8s/iad-ci/argo-workflows/miroir-ci.yaml`\n- DAG: checkout → lint → test → build-binary → docker-build (tag-gated) → github-release (tag-gated)\n- `cargo fmt --check`, `cargo clippy -D warnings`, `cargo test --all`, musl build\n- Kaniko for image push to `ghcr.io/jedarden/miroir:`, `:latest`, `:`, `:`\n- `gh release create` with both binaries + sha256\n\n**Helm chart `charts/miroir/`** (plan §6)\n- Templates: deployment, service, headless, configmap, secret, HPA, optional PVC (CDC), StatefulSet for meilisearch, meilisearch service, optional Redis deployment, serviceaccount\n- `values.yaml` with dev defaults (replicas=1, SQLite, RF=1, RG=1, HPA off)\n- `values.schema.json` that rejects:\n - `miroir.replicas > 1` with `taskStore.backend: sqlite`\n - `miroir.hpa.enabled: true` without `replicas >= 2 && taskStore.backend: redis`\n - `search_ui.rate_limit.backend: local` when `miroir.replicas > 1`\n - Admin login rate-limit local backend in HA\n - `search_ui.scoped_key_rotate_before_expiry_days >= scoped_key_max_age_days`\n- `_helpers.tpl` for fully-qualified StatefulSet DNS node addresses (plan §6 ConfigMap)\n- `NOTES.txt` with next-step pointers\n\n**ArgoCD Application** (plan §6) — `k8s//miroir//` path in `jedarden/declarative-config`, automated sync + prune + selfHeal\n\n**Release mechanics** (plan §7)\n- `CHANGELOG.md` Keep a Changelog format; CI extracts section for GitHub release notes\n- `Cargo.toml` workspace version bumped before tag\n- `Chart.yaml` `appVersion` bumped before tag\n- Tag format: `v[0-9]+.[0-9]+.[0-9]+*`\n\n## Infrastructure Reference\n\n- Registry: `ghcr.io/jedarden/miroir`\n- Helm chart OCI: `ghcr.io/jedarden/charts/miroir`\n- Pages: `https://jedarden.github.io/miroir`\n- CI secrets on iad-ci: `ghcr-credentials` (argo-workflows/.dockerconfigjson), `github-token` (argo-workflows/token)\n- Argo UI: `https://argo-ci.ardenone.com`\n\n## Definition of Done\n\n- [ ] `kubectl --kubeconfig=$HOME/.kube/iad-ci.kubeconfig apply -f workflow.yaml` completes the full CI pipeline on `main` within ~10 min\n- [ ] Pushing tag `v0.1.0-rc.1` produces a ghcr.io image, a GitHub pre-release, and does NOT update `latest`/float tags\n- [ ] `helm install search charts/miroir --namespace search --wait` stands up a working single-pod cluster\n- [ ] `values.schema.json` rejections tested via `helm lint --strict` with mutating values files\n- [ ] Final image ≤ 15 MB compressed\n- [ ] ArgoCD app syncs cleanly against ardenone-manager read-only proxy","status":"open","priority":0,"issue_type":"epic","created_at":"2026-04-18T21:21:13.608558775Z","created_by":"coding","updated_at":"2026-04-18T21:23:08.690462028Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["phase","phase-8"],"dependencies":[{"issue_id":"miroir-qjt","depends_on_id":"miroir-9dj","type":"blocks","created_at":"2026-04-18T21:23:08.690406249Z","created_by":"coding","metadata":"{}","thread_id":""}]} {"id":"miroir-qjt.1","title":"P8.1 Dockerfile: scratch + static musl miroir-proxy","description":"## What\n\nShip the `Dockerfile` from plan §7:\n```dockerfile\nFROM scratch\nCOPY miroir-proxy-linux-amd64 /miroir-proxy\nEXPOSE 7700 9090\nENTRYPOINT [\"/miroir-proxy\"]\nCMD [\"--config\", \"/etc/miroir/config.yaml\"]\n```\n\nOCI labels (plan §12):\n```\norg.opencontainers.image.source=https://github.com/jedarden/miroir\norg.opencontainers.image.version=\norg.opencontainers.image.revision=\norg.opencontainers.image.licenses=MIT\n```\n\nTarget: compressed image < 15 MB.\n\n## Why\n\nPlan §1 principle 6 + §12: \"scratch base, no libc. Zero OS packages, no shell.\" This is the smallest possible attack surface and the fastest possible pull (one layer, tiny). Makes trivial deploys feasible on edge clusters.\n\n## Details\n\n**Musl build step** (plan §7 `cargo-build` template):\n```bash\napt-get install -qy musl-tools\nrustup target add x86_64-unknown-linux-musl\ncargo build --release --target x86_64-unknown-linux-musl -p miroir-proxy\ncargo build --release --target x86_64-unknown-linux-musl -p miroir-ctl\nsha256sum miroir-proxy-linux-amd64 > miroir-proxy-linux-amd64.sha256\n```\n\n**Layers**: COPY the static binary directly from `/workspace/artifacts/` into `/miroir-proxy` in the scratch image.\n\n**Config mount**: `/etc/miroir/config.yaml` via ConfigMap mount (Helm chart).\n\n**No shell = no `docker exec -it` debugging** — intentional. Debug by logs + metrics + `kubectl describe` only. Operators who need shell can run a sidecar.\n\n## Acceptance\n\n- [ ] `docker build .` on an artifact-equipped workspace produces an image < 15 MB compressed\n- [ ] `docker run --help` returns clap help (binary works from scratch base)\n- [ ] Image labels contain all 4 OCI labels with correct values\n- [ ] Static linkage: `ldd` against the extracted binary prints \"not a dynamic executable\"","status":"open","priority":0,"issue_type":"task","created_at":"2026-04-18T21:43:56.826575101Z","created_by":"coding","updated_at":"2026-04-18T21:43:56.826575101Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["phase-8"],"dependencies":[{"issue_id":"miroir-qjt.1","depends_on_id":"miroir-qjt","type":"parent-child","created_at":"2026-04-18T21:43:56.826575101Z","created_by":"coding","metadata":"{}","thread_id":""}]} @@ -137,12 +137,12 @@ {"id":"miroir-uyx.4","title":"P11.4 miroir-ctl subcommand docs + runbooks","description":"## What\n\nFor each `miroir-ctl` subcommand listed in plan §4 crate layout + §11 common operations:\n- `clap`-generated `--help` output covers flags + examples\n- A short runbook `docs/ctl/.md` with purpose, preconditions, examples, gotchas\n\nCommands covered:\n- `status`, `node add/drain`, `rebalance status --watch`, `verify`, `task status`\n- `reshard` (§13.1), `alias` (§13.7), `ttl` (§13.14), `cdc` (§13.13)\n- `shadow` (§13.16), `ui` (§13.19/§13.21 — scoped-key rotation, JWT rotation)\n- `tenant` (§13.15), `explain` (§13.20), `dump import` (§13.9), `canary` (§13.18)\n\n## Why\n\nPlan §12: \"`miroir-ctl --help` — all subcommands documented via clap.\" But `--help` alone isn't enough — operators need examples and gotchas. A good runbook is what prevents a 3-AM mis-run.\n\n## Details\n\n**Runbook template**:\n```markdown\n# `miroir-ctl `\n\n## Purpose\n\n\n## Preconditions\n- [ ] ...\n\n## Examples\n```\nmiroir-ctl ... --example\n```\n\n## Gotchas\n- ...\n\n## See also\n- Plan §X.X\n```\n\n**Integration with Admin UI (§13.19)**: many commands have a UI equivalent — runbook should cross-reference both (\"prefer UI for one-off; prefer CLI for scripts / CI\").\n\n## Acceptance\n\n- [ ] Every subcommand in the crate layout has a matching `docs/ctl/*.md` runbook\n- [ ] `miroir-ctl status --help` mentions where to find runbook for more\n- [ ] The runbooks are all under 100 lines each (easy to read before operating)","status":"open","priority":1,"issue_type":"task","created_at":"2026-04-18T21:48:38.832471052Z","created_by":"coding","updated_at":"2026-04-18T21:48:38.832471052Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["phase-11"],"dependencies":[{"issue_id":"miroir-uyx.4","depends_on_id":"miroir-uyx","type":"parent-child","created_at":"2026-04-18T21:48:38.832471052Z","created_by":"coding","metadata":"{}","thread_id":""}]} {"id":"miroir-uyx.5","title":"P11.5 Common issues + troubleshooting","description":"## What\n\nPlan §11 \"Common issues\" as structured troubleshooting docs at `docs/troubleshooting.md`:\n- \"primary key required\" — Miroir requires explicit primary key at index creation\n- \"Search returns fewer results than expected\" — degraded-node cross-reference + `GET /_miroir/topology`\n- \"Task polling stuck at processing\" — per-node task status via `miroir-ctl task status`\n\nPlus others discovered during Phase 9 testing and chaos scenarios.\n\n## Why\n\nEvery production system accumulates a list of \"the 10 things new users hit in their first week.\" Documenting them transparently shortens the mean-time-to-productive-user from hours to minutes.\n\n## Details\n\n**Per-issue structure**:\n```markdown\n## Error: \"primary key required\"\n\n### Symptom\nClient sees: `HTTP 400 { \"code\": \"miroir_primary_key_required\" }`\n\n### Cause\nThe index was created without a primary key. Miroir cannot route without one.\n\n### Fix\n```bash\ncurl -X POST https://miroir/indexes \\\n -H \"Authorization: Bearer $KEY\" \\\n -d '{\"uid\": \"myindex\", \"primaryKey\": \"id\"}'\n```\n\n### Why this differs from Meilisearch\nMeilisearch can infer the primary key from the first document batch. Miroir cannot — it needs to hash the PK *before* any node sees it. Explicit primary_key at index creation is required.\n```\n\n**Diagnostic playbook**: `docs/troubleshooting/diagnostics.md` — first thing to check for any symptom:\n1. `GET /_miroir/topology` — all nodes healthy?\n2. `GET /_miroir/metrics | grep degraded` — any degraded shards?\n3. `kubectl logs miroir-0 --tail=100 | jq 'select(.level==\"ERROR\")'` — recent errors?\n4. `kubectl get pods -n search` — all running?\n\n## Acceptance\n\n- [ ] 3 plan §11 issues documented with the template\n- [ ] At least 5 additional issues discovered in Phase 9 chaos added\n- [ ] Troubleshooting doc cross-linked from README, install guide, each migration guide","status":"open","priority":1,"issue_type":"task","created_at":"2026-04-18T21:48:38.877214633Z","created_by":"coding","updated_at":"2026-04-18T21:48:38.877214633Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["phase-11"],"dependencies":[{"issue_id":"miroir-uyx.5","depends_on_id":"miroir-uyx","type":"parent-child","created_at":"2026-04-18T21:48:38.877214633Z","created_by":"coding","metadata":"{}","thread_id":""}]} {"id":"miroir-uyx.6","title":"P11.6 Helm chart publication: GH Pages + OCI push","description":"## What\n\nPlan §12 delivered artifacts for the Helm chart:\n- **Primary**: `https://jedarden.github.io/miroir` (GitHub Pages, `gh-pages` branch)\n- **OCI**: `ghcr.io/jedarden/charts/miroir` (for air-gapped environments)\n\nExtend the Phase 8 Argo Workflow `miroir-ci` template with:\n- On tag: `helm package charts/miroir -d dist/`\n- Push to gh-pages: update `index.yaml` + copy `.tgz` into the branch, commit via `gh-pages` helper\n- OCI push: `helm push dist/miroir-.tgz oci://ghcr.io/jedarden/charts`\n\n## Why\n\nPlan §12: chart users expect `helm repo add` to work. Without publication, operators have to `helm install charts/miroir/` from a git clone — fine for dev, wrong for prod.\n\n## Details\n\n**gh-pages flow**:\n```bash\ngit worktree add gh-pages gh-pages\nhelm package charts/miroir -d gh-pages/\nhelm repo index gh-pages/ --url https://jedarden.github.io/miroir --merge gh-pages/index.yaml\ngit -C gh-pages add -A\ngit -C gh-pages commit -m \"Release chart v\"\ngit -C gh-pages push origin gh-pages\n```\n\n**OCI push** requires GHCR write token (already have in `ghcr-credentials`):\n```bash\necho $GHCR_TOKEN | helm registry login ghcr.io -u --password-stdin\nhelm push miroir-.tgz oci://ghcr.io/jedarden/charts\n```\n\n**Chart-only fixes**: when a chart change doesn't need an app rebuild, bump only chart version (not appVersion). CI must detect \"chart-only\" change (e.g., by diffing `charts/**` vs. `crates/**`) and skip the binary rebuild.\n\n## Acceptance\n\n- [ ] After `git tag v0.1.0 && git push`, `helm repo add miroir https://jedarden.github.io/miroir && helm repo update` discovers v0.1.0\n- [ ] `helm install ... oci://ghcr.io/jedarden/charts/miroir --version 0.1.0` works identically\n- [ ] Chart-only fix: tagging `v0.1.1` after editing only a template file bumps chart version without new app binary","status":"open","priority":2,"issue_type":"task","created_at":"2026-04-18T21:48:38.909893288Z","created_by":"coding","updated_at":"2026-04-18T21:48:38.909893288Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["phase-11"],"dependencies":[{"issue_id":"miroir-uyx.6","depends_on_id":"miroir-uyx","type":"parent-child","created_at":"2026-04-18T21:48:38.909893288Z","created_by":"coding","metadata":"{}","thread_id":""}]} -{"id":"miroir-yio","title":"Global-IDF preflight: implement dfs_query_then_fetch for cross-shard comparability","description":"## Context\n\nRRF validation (miroir-zfo) confirmed that RRF merge produces τ = 0.14 against ground truth — catastrophically worse than score-based merge (τ = 0.79). Neither strategy meets the 0.95 threshold.\n\nThe root cause is that shards with different document distributions compute different local IDF values, making scores and rankings incomparable across shards.\n\n## What\n\nImplement the Elasticsearch `dfs_query_then_fetch` pattern as a pre-query phase in Miroir:\n\n1. Coordinator sends a lightweight DFS (Distributed Frequency Search) request to all shards\n2. Each shard returns term-level document frequencies for the query terms\n3. Coordinator aggregates into global IDF values\n4. Coordinator sends the actual search query with global IDF attached\n5. Shards use global IDF for scoring instead of local IDF\n\n## Why\n\nThis is the proven solution. Both score-based merging (τ = 0.79) and RRF (τ = 0.14) fail the τ ≥ 0.95 quality threshold with skewed shards.\n\n## Scope\n\n- New `DfsPhase` in the scatter-gather pipeline\n- Coordinator-side IDF aggregation\n- Shard-side global-IDF scoring override\n- Integration test with skewed corpus\n- Benchmark to measure latency overhead of the preflight round\n\n## Depends on\n\n- miroir-zfo (RRF validation — complete)\n- miroir-zc2.4 (score normalization research — complete)","status":"open","priority":1,"issue_type":"task","created_at":"2026-04-19T06:42:00.808359301Z","created_by":"coding","updated_at":"2026-04-19T06:42:00.808359301Z","source_repo":".","compaction_level":0,"original_size":0} +{"id":"miroir-yio","title":"Global-IDF preflight: implement dfs_query_then_fetch for cross-shard comparability","description":"## Context\n\nRRF validation (miroir-zfo) confirmed that RRF merge produces τ = 0.14 against ground truth — catastrophically worse than score-based merge (τ = 0.79). Neither strategy meets the 0.95 threshold.\n\nThe root cause is that shards with different document distributions compute different local IDF values, making scores and rankings incomparable across shards.\n\n## What\n\nImplement the Elasticsearch `dfs_query_then_fetch` pattern as a pre-query phase in Miroir:\n\n1. Coordinator sends a lightweight DFS (Distributed Frequency Search) request to all shards\n2. Each shard returns term-level document frequencies for the query terms\n3. Coordinator aggregates into global IDF values\n4. Coordinator sends the actual search query with global IDF attached\n5. Shards use global IDF for scoring instead of local IDF\n\n## Why\n\nThis is the proven solution. Both score-based merging (τ = 0.79) and RRF (τ = 0.14) fail the τ ≥ 0.95 quality threshold with skewed shards.\n\n## Scope\n\n- New `DfsPhase` in the scatter-gather pipeline\n- Coordinator-side IDF aggregation\n- Shard-side global-IDF scoring override\n- Integration test with skewed corpus\n- Benchmark to measure latency overhead of the preflight round\n\n## Depends on\n\n- miroir-zfo (RRF validation — complete)\n- miroir-zc2.4 (score normalization research — complete)","status":"in_progress","priority":1,"issue_type":"task","assignee":"alpha","created_at":"2026-04-19T06:42:00.808359301Z","created_by":"coding","updated_at":"2026-04-19T06:55:38.065100933Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["failure-count:2"]} {"id":"miroir-zc2","title":"Phase 12 — Open Problems + Research (§15)","description":"## Phase 12 Epic — Open Problems Tracking\n\nStanding bucket for the plan §15 open problems that are **not** fully resolved by initial implementation. These are research/validation/future-enhancement beads, not blockers for v1.0. This phase does not block the genesis bead's shipping path — it's a parallel track that persists beyond v1.0.\n\n## Why An Epic At All\n\nPlan §15 flags these as \"documented constraints, not blockers. Initial release ships with known limitations.\" Tracking them as beads means they're not forgotten, they have a visible owner, and their resolution status can be surfaced alongside the rest of the work.\n\n## Scope — the 6 Open Problems (plan §15)\n\n1. **Shard migration write safety** — OP#1. **Status: partially addressed.** Dual-write cutover sequencing (Phase 4) + anti-entropy reconciler (§13.8 / Phase 5) catches slipped docs. Remaining work: chaos-test the cutover boundary, document any reproducible window where data could be lost if anti-entropy is disabled.\n\n2. **Task state HA (Raft vs. Redis)** — OP#2. **Status: deferred.** Current: Redis for multi-pod, SQLite for single-pod. Future: lightweight in-process Raft (or equivalent) so Redis is not required in HA. Not v1.x.\n\n3. **Resharding (S change) vs. node scaling (N change)** — OP#3. **Status: addressed by §13.1** (shadow-index dual-hash). Remaining work: empirical validation of the §13.1 \"2× transient storage and write load\" caveat under real corpora; schedule guidance in the CLI for off-peak reshard windows.\n\n4. **Score normalization at scale** — OP#4. **Status: settings-divergence addressed by §13.5 two-phase broadcast + drift reconciler.** Remaining work is purely statistical: validate that `_rankingScore` remains comparable across shards with very different document-count distributions. Requires corpus diversity tests.\n\n5. **Dump import distribution** — OP#5. **Status: addressed by §13.9 streaming routed dump import.** Broadcast mode retained as fallback. Remaining work: identify and enumerate every dump variant `mode: streaming` cannot fully reconstruct; either extend streaming or document the fallback trigger clearly.\n\n6. **arm64 support** — OP#6. **Status: not planned for v0.x.** Wire into CI when K8s ARM node support is actually needed (likely v1.x or later).\n\n## How To Use This Phase\n\n- Each OP becomes a child bead (bug/feature type) under this epic\n- Beads stay open until the status column above says \"fully addressed\"\n- v1.0 release notes should explicitly link to this epic so operators know what's still on the table\n- New open problems discovered during implementation get added here rather than silently accreted elsewhere\n\n## Not In Scope\n\n- Any concrete implementation work already covered by §13.1 / §13.5 / §13.8 / §13.9 — that belongs to Phase 5.","status":"open","priority":2,"issue_type":"epic","created_at":"2026-04-18T21:22:54.403910669Z","created_by":"coding","updated_at":"2026-04-18T21:22:54.403910669Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["phase","phase-12","research"]} {"id":"miroir-zc2.1","title":"P12.OP1 Shard migration write safety — cutover race window analysis","description":"## What\n\nPlan §15 Open Problem #1: \"Dual-write during migration must not lose documents that arrive exactly at the migration cutover boundary.\"\n\n**Status** per plan: partially addressed. Race window mitigated by §13.8 anti-entropy; any slipped doc caught on next reconciliation pass.\n\n**Remaining work**:\n- Chaos-test the cutover boundary — specifically: docs arriving at the instant of `active` transition (step 7 in plan §2 \"Adding a node\")\n- Document any reproducible window where data could be lost if anti-entropy is disabled\n- If found: extend Phase 4 dual-write to hold the window longer OR require anti-entropy to be on (hard-coded policy)\n\n## Why\n\n\"Plan §15 Open Problem 1 closure\" has been claimed in §13.8 — this bead verifies that claim empirically before we ship v1.0 committing to it.\n\n## Details\n\n**Chaos test design**:\n1. Start 3-node cluster, write 1000 docs\n2. Trigger node addition (`POST /_miroir/nodes`)\n3. During dual-write, rapid-fire new writes with tight (1ms) interval\n4. Tight-loop the transition from step 4 (migration complete) to step 7 (old replica deleted)\n5. Assert: every written doc retrievable AFTER step 7\n\n**Variants**:\n- With anti-entropy enabled (default) — expect 100% retrievable\n- With anti-entropy **disabled** — measure loss rate. If > 0, document + add a schema constraint refusing to enable migrations when anti-entropy is off\n\n## Acceptance\n\n- [ ] Chaos test published; runs on every v1.0-gating CI run\n- [ ] Loss rate measured at < 1 per 1M writes with AE on\n- [ ] Loss rate measured without AE; decision documented in `docs/trade-offs.md`\n- [ ] If `anti_entropy.enabled: false` + migration concurrent → loud warning log + (decided) refuse or warn","status":"closed","priority":2,"issue_type":"bug","assignee":"alpha","created_at":"2026-04-18T21:49:47.774525899Z","created_by":"coding","updated_at":"2026-04-19T02:01:02.057461283Z","closed_at":"2026-04-19T02:01:02.057395870Z","close_reason":"done","source_repo":".","compaction_level":0,"original_size":0,"labels":["deferred","open-problem","phase-12","research"],"dependencies":[{"issue_id":"miroir-zc2.1","depends_on_id":"miroir-zc2","type":"parent-child","created_at":"2026-04-18T21:49:47.774525899Z","created_by":"coding","metadata":"{}","thread_id":""}]} {"id":"miroir-zc2.2","title":"P12.OP2 Task state HA — evaluate lightweight Raft vs. Redis requirement","description":"## What\n\nPlan §15 Open Problem #2: \"SQLite is single-writer. Running 2 Miroir replicas requires Redis. A future enhancement is a lightweight Raft-based in-process consensus so Redis is not required for HA mode.\"\n\n**Status** per plan: deferred. Current solution (Redis) works; Raft would remove an external dependency.\n\n**Research work**:\n- Survey embedded Raft crates: `openraft`, `raft-rs`, `async-raft`\n- Prototype: `TaskStore` trait impl backed by Raft state machine\n- Measure: latency + throughput vs. Redis; memory footprint per plan §14.2\n- Decide: ship in v1.x or never\n\n## Why\n\nRemoving Redis as a hard dependency shrinks the operational surface (one less thing to monitor, backup, rotate secrets for). But Raft adds complexity — a bad Raft impl can eat data in ways Redis doesn't.\n\nNot blocking v0.x or v1.0 — but worth prototyping before v2.0.\n\n## Details\n\n**Decision gate**: the Raft-backed path must be measurably better than Redis on at least one metric (ops simplicity, latency, or memory) without being worse on any of the others, before shipping.\n\n**Output**: `docs/research/raft-task-store.md` with the decision + benchmark data + reasoning. Keep or discard based on findings.\n\n## Acceptance\n\n- [ ] Research doc published with prototype branch linked\n- [ ] Decision recorded: ship / don't ship / revisit when","status":"closed","priority":3,"issue_type":"feature","assignee":"bravo","created_at":"2026-04-18T21:49:47.798646718Z","created_by":"coding","updated_at":"2026-04-19T02:57:16.452177084Z","closed_at":"2026-04-19T02:57:16.452114067Z","close_reason":"P12.OP2 complete. Surveyed openraft/raft-rs/async-raft (recommend openraft if revisited). Built feature-gated Raft state machine prototype at crates/miroir-core/src/raft_proto/ with benchmarks. Decision: do not ship Raft in v0.x/v1.0 -- Redis wins on write latency, throughput, correctness maturity, and operational tooling. Raft only wins on ops simplicity and read latency. Does not pass the decision gate. Revisit before v2.0 when Redis backend is production-stabilized and openraft reaches v1.0. Full analysis in docs/research/raft-task-store.md.","source_repo":".","compaction_level":0,"original_size":0,"labels":["deferred","open-problem","phase-12","research"],"dependencies":[{"issue_id":"miroir-zc2.2","depends_on_id":"miroir-zc2","type":"parent-child","created_at":"2026-04-18T21:49:47.798646718Z","created_by":"coding","metadata":"{}","thread_id":""}]} {"id":"miroir-zc2.3","title":"P12.OP3 Online resharding — validate 2× transient load caveat under real corpora","description":"## What\n\nPlan §15 Open Problem #3: §13.1 online resharding ships as a remediation, NOT a license to under-provision. Plan: \"doubles transient storage and write load; treat §13.1 as a remediation, not a license to under-provision.\"\n\n**Remaining work**:\n- Empirical validation of the 2× storage + write load estimate under real corpora (varied doc sizes, write rates, settings complexity)\n- CLI schedule guidance: `miroir-ctl reshard --schedule-window off-peak` — refuses to start outside a named window unless `--force`\n\n## Why\n\nOperators will over-commit to resharding if the \"2× transient\" caveat turns out to be 3× or worse in practice. Real numbers prevent that.\n\n## Details\n\n**Test matrix**:\n| Doc size | Corpus | Write rate | RG | RF | Measured peak storage |\n|----------|--------|------------|----|----|-----------------------|\n| 1 KB | 10 GB | 100 dps | 2 | 1 | ? |\n| 10 KB | 100 GB | 1000 dps | 2 | 2 | ? |\n| 1 MB (blobs) | 1 TB | 10 dps | 2 | 1 | ? |\n\nPublish results in `docs/benchmarks/resharding-load.md`.\n\n**CLI window guard**: config knob `resharding.allowed_windows: [\"02:00-06:00 UTC\"]`. CLI refuses outside windows without `--force`.\n\n## Acceptance\n\n- [ ] Benchmark doc published with real numbers\n- [ ] CLI window guard implemented; integration test confirms rejection outside window\n- [ ] Benchmark run in Phase 9 performance suite as part of v1.0 validation","status":"closed","priority":3,"issue_type":"task","assignee":"bravo","created_at":"2026-04-18T21:49:47.828099118Z","created_by":"coding","updated_at":"2026-04-19T02:09:48.450456008Z","closed_at":"2026-04-19T02:09:48.450390357Z","close_reason":"P12.OP3 complete. All acceptance criteria verified:\n\n1. Benchmark doc (docs/benchmarks/resharding-load.md): Published with results for all 3 scenarios (1KB/10GB, 10KB/100GB, 1MB/1TB). Storage amplification confirmed at exactly 2.00x and dual-write amplification at exactly 2.00x across all scenarios.\n\n2. CLI schedule window guard: Implemented in miroir-ctl reshard command. Config knob resharding.allowed_windows restricts resharding to named windows. CLI refuses outside windows unless --force given.\n\n3. Integration tests (window_guard.rs): 4 tests all passing. 24 total resharding tests pass.\n\n4. Benchmark binary (reshard_load.rs): Full simulation using actual routing code, validates invariants.","source_repo":".","compaction_level":0,"original_size":0,"labels":["deferred","open-problem","phase-12","research"],"dependencies":[{"issue_id":"miroir-zc2.3","depends_on_id":"miroir-zc2","type":"parent-child","created_at":"2026-04-18T21:49:47.828099118Z","created_by":"coding","metadata":"{}","thread_id":""}]} -{"id":"miroir-zc2.4","title":"P12.OP4 Score normalization at scale — statistical validation of cross-shard comparability","description":"## What\n\nPlan §15 Open Problem #4: \"`_rankingScore` is comparable across shards only when index settings are identical.\" Settings divergence addressed by §13.5; remaining concern is statistical — do scores stay comparable when shards have very different document-count distributions?\n\n**Research work**:\n- Build a test corpus with intentionally skewed shard populations (one shard 100×, another shard 0.01× the median)\n- Submit identical queries; measure score distribution per shard\n- Assert: top-K merged ordering matches a ground-truth single-index version within some ε\n- If large ε, document + possibly introduce a score normalization pass\n\n## Why\n\nElasticsearch (plan research doc §1) hits this exactly: \"BM25 scoring depends on IDF, computed per shard by default using only that shard's local term statistics.\" Meilisearch uses its own ranking pipeline, but the same issue applies — local rank stats can drift from global on skewed shards.\n\n## Details\n\n**Ground truth**: single-index Meilisearch running the same queries against the same corpus.\n\n**Divergence metric**: Kendall τ between Miroir result ordering and single-index result ordering across 10k random queries.\n\n**If τ < 0.95 on average**: investigate whether a global IDF-style preflight is worth adding (plan research §1 \"`dfs_query_then_fetch`\" pattern).\n\n**Output**: `docs/research/score-normalization-at-scale.md`.\n\n## Acceptance\n\n- [ ] Benchmark corpus + query set published in `tests/benches/score-comparability/`\n- [ ] Results reported with confidence intervals\n- [ ] If τ < 0.95: follow-up bead created for a normalization pass\n- [ ] If τ ≥ 0.95: note-of-no-action in the bead's close comment","status":"closed","priority":3,"issue_type":"task","assignee":"charlie","created_at":"2026-04-18T21:49:47.849019120Z","created_by":"coding","updated_at":"2026-04-19T06:41:46.830230930Z","closed_at":"2026-04-19T06:41:46.829923593Z","close_reason":"P12.OP4 score normalization validation complete. Score-based merge τ=0.79, RRF τ=0.14 — both fail 0.95 threshold. Follow-up bead miroir-n6v created for global-IDF preflight (dfs_query_then_fetch pattern). Benchmark corpus in tests/benches/score-comparability/. Research doc at docs/research/score-normalization-at-scale.md.","source_repo":".","compaction_level":0,"original_size":0,"labels":["deferred","failure-count:1","open-problem","phase-12","research"],"dependencies":[{"issue_id":"miroir-zc2.4","depends_on_id":"miroir-nsu","type":"blocks","created_at":"2026-04-19T03:56:41.560992652Z","created_by":"coding","metadata":"{}","thread_id":""},{"issue_id":"miroir-zc2.4","depends_on_id":"miroir-zc2","type":"parent-child","created_at":"2026-04-18T21:49:47.849019120Z","created_by":"coding","metadata":"{}","thread_id":""}]} +{"id":"miroir-zc2.4","title":"P12.OP4 Score normalization at scale — statistical validation of cross-shard comparability","description":"## What\n\nPlan §15 Open Problem #4: \"`_rankingScore` is comparable across shards only when index settings are identical.\" Settings divergence addressed by §13.5; remaining concern is statistical — do scores stay comparable when shards have very different document-count distributions?\n\n**Research work**:\n- Build a test corpus with intentionally skewed shard populations (one shard 100×, another shard 0.01× the median)\n- Submit identical queries; measure score distribution per shard\n- Assert: top-K merged ordering matches a ground-truth single-index version within some ε\n- If large ε, document + possibly introduce a score normalization pass\n\n## Why\n\nElasticsearch (plan research doc §1) hits this exactly: \"BM25 scoring depends on IDF, computed per shard by default using only that shard's local term statistics.\" Meilisearch uses its own ranking pipeline, but the same issue applies — local rank stats can drift from global on skewed shards.\n\n## Details\n\n**Ground truth**: single-index Meilisearch running the same queries against the same corpus.\n\n**Divergence metric**: Kendall τ between Miroir result ordering and single-index result ordering across 10k random queries.\n\n**If τ < 0.95 on average**: investigate whether a global IDF-style preflight is worth adding (plan research §1 \"`dfs_query_then_fetch`\" pattern).\n\n**Output**: `docs/research/score-normalization-at-scale.md`.\n\n## Acceptance\n\n- [ ] Benchmark corpus + query set published in `tests/benches/score-comparability/`\n- [ ] Results reported with confidence intervals\n- [ ] If τ < 0.95: follow-up bead created for a normalization pass\n- [ ] If τ ≥ 0.95: note-of-no-action in the bead's close comment","status":"closed","priority":3,"issue_type":"task","assignee":"charlie","created_at":"2026-04-18T21:49:47.849019120Z","created_by":"coding","updated_at":"2026-04-19T06:54:42.282404673Z","closed_at":"2026-04-19T06:54:42.282137259Z","close_reason":"P12.OP4 score normalization validation complete.\n\nResults: Score-based merge Kendall τ=0.79 [95% CI: 0.787-0.801], RRF τ=0.14 [95% CI: 0.134-0.140]. Both fail τ≥0.95 threshold. Common-term queries worst (score τ=0.15, RRF τ=0.11) due to IDF divergence between tiny/large shards. Root cause: shard-local IDF inflates scores from small shards. Follow-up bead miroir-yio created for global-IDF preflight (dfs_query_then_fetch pattern). Artifacts: tests/benches/score-comparability/, docs/research/score-normalization-at-scale.md","source_repo":".","compaction_level":0,"original_size":0,"labels":["deferred","failure-count:2","open-problem","phase-12","research"],"dependencies":[{"issue_id":"miroir-zc2.4","depends_on_id":"miroir-nsu","type":"blocks","created_at":"2026-04-19T03:56:41.560992652Z","created_by":"coding","metadata":"{}","thread_id":""},{"issue_id":"miroir-zc2.4","depends_on_id":"miroir-zc2","type":"parent-child","created_at":"2026-04-18T21:49:47.849019120Z","created_by":"coding","metadata":"{}","thread_id":""}]} {"id":"miroir-zc2.5","title":"P12.OP5 Dump import variants — enumerate what streaming mode can't handle","description":"## What\n\nPlan §15 Open Problem #5: §13.9 streaming routed dump import addresses the main case; broadcast mode retained as a fallback for dump variants Miroir cannot fully reconstruct via public API.\n\n**Remaining work**:\n- Identify and enumerate every dump variant streaming can't reconstruct\n- Either extend streaming to handle them OR document the fallback trigger clearly in `miroir-ctl dump import --help`\n\n## Why\n\n\"Can't reconstruct\" is vague — operators deserve concrete lists of what works and what doesn't. Without this, the `broadcast` fallback path is a bug waiting to happen.\n\n## Details\n\n**Potential failure modes to investigate**:\n- Dumps from older Meilisearch versions with pre-v1.37 schema\n- Dumps with custom keys (POST /keys) that have indexes list or actions not representable via public API\n- Dumps with snapshot-taken-mid-write where Miroir-injected `_miroir_shard` would conflict with an existing client field\n\n**Deliverable**: `docs/dump-import/compatibility-matrix.md` with columns:\n| Meilisearch version | Dump variant | Streaming works? | Broadcast needed? | Workaround |\n\n## Acceptance\n\n- [ ] Matrix published\n- [ ] Each \"broadcast needed\" row has a workaround or a link to an open enhancement bead\n- [ ] `miroir-ctl dump import` output references the matrix when falling back to broadcast","status":"closed","priority":3,"issue_type":"task","assignee":"bravo","created_at":"2026-04-18T21:49:47.884303207Z","created_by":"coding","updated_at":"2026-04-19T01:09:27.327131515Z","closed_at":"2026-04-19T01:09:27.327067549Z","close_reason":"Compatibility matrix published at docs/dump-import/compatibility-matrix.md\n\n- Matrix enumerates all dump variants that streaming mode can/cannot reconstruct\n- Each broadcast fallback row has workaround or enhancement bead link\n- CLI output reference section documents fallback message\n- Covers: version compatibility, field conflicts, EE features, snapshots, corrupted dumps","source_repo":".","compaction_level":0,"original_size":0,"labels":["failure-count:1","open-problem","phase-12","research"],"dependencies":[{"issue_id":"miroir-zc2.5","depends_on_id":"miroir-zc2","type":"parent-child","created_at":"2026-04-18T21:49:47.884303207Z","created_by":"coding","metadata":"{}","thread_id":""}]} {"id":"miroir-zc2.6","title":"P12.OP6 arm64 support (deferred to v1.x+)","description":"## What\n\nPlan §15 Open Problem #6: \"Not planned for v0.x. Added when K8s ARM node support is required.\"\n\n**Future work when prioritized**:\n- Cross-compile `miroir-proxy` and `miroir-ctl` for `aarch64-unknown-linux-musl` in the CI pipeline\n- Docker image manifest list: `ghcr.io/jedarden/miroir:` spans `linux/amd64` + `linux/arm64`\n- Helm chart: no changes (binary is arch-agnostic at the k8s layer)\n- Phase 9 CI: add arm64 test runs\n\n## Why\n\nARM node support is increasingly common (Hetzner Ampere, AWS Graviton, GCP Tau T2A, Rackspace Spot). But Miroir's fleet is currently all amd64 (iad-ci is amd64; ardenone cluster nodes are amd64). No current demand to justify the CI complexity.\n\nKeep this bead open as a placeholder; promote to in-progress when a concrete use case emerges.\n\n## Details\n\n**When ready**: the Argo Workflow `cargo-build` step needs a matrix over targets:\n```yaml\n- name: cargo-build\n container:\n args:\n - |\n rustup target add x86_64-unknown-linux-musl\n rustup target add aarch64-unknown-linux-musl\n apt-get install -qy musl-tools gcc-aarch64-linux-gnu\n cargo build --release --target x86_64-unknown-linux-musl -p miroir-proxy\n cargo build --release --target aarch64-unknown-linux-musl -p miroir-proxy\n ...\n```\n\nKaniko build needs `--customPlatform=linux/amd64,linux/arm64` or equivalent for multi-arch manifests.\n\n## Acceptance\n\n- [ ] Not to be closed until arm64 is a live deliverable\n- [ ] Cross-reference here when the priority flips","status":"in_progress","priority":4,"issue_type":"feature","assignee":"charlie","created_at":"2026-04-18T21:49:47.917666333Z","created_by":"coding","updated_at":"2026-04-19T00:58:19.767272778Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["open-problem","phase-12","roadmap"],"dependencies":[{"issue_id":"miroir-zc2.6","depends_on_id":"miroir-zc2","type":"parent-child","created_at":"2026-04-18T21:49:47.917666333Z","created_by":"coding","metadata":"{}","thread_id":""}]} -{"id":"miroir-zfo","title":"P12.OP4 follow-up: Validate RRF merging quality with score-comparability benchmark","description":"## Context\n\nScore normalization research (miroir-zc2.4) found that raw _rankingScore merging gives Kendall τ = 0.79 vs ground truth — well below the 0.95 threshold. RRF merging is already implemented in merger.rs as the mitigation.\n\n## What\n\nRe-run the score-comparability benchmark using Miroir's actual RRF merger (instead of the score-based merge in simulate.py) and measure τ against ground truth. This validates that RRF solves the cross-shard comparability problem.\n\n## Steps\n1. Add an RRF merge mode to simulate.py (or write a Rust test that uses the actual merger)\n2. Re-run with the same 10K query set against the skewed corpus\n3. Measure Kendall τ between RRF-merged results and single-index ground truth\n4. If τ ≥ 0.95: close with note-of-no-action\n5. If τ < 0.95: investigate global-IDF preflight (plan §1 dfs_query_then_fetch pattern)\n\n## Acceptance\n- [ ] RRF merge benchmarked against ground truth\n- [ ] τ reported with 95% CI\n- [ ] If τ < 0.95: create bead for global-IDF preflight implementation","status":"in_progress","priority":2,"issue_type":"issue","assignee":"alpha","created_at":"2026-04-19T04:06:52.077073258Z","created_by":"coding","updated_at":"2026-04-19T06:33:48.936667617Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["deferred"]} +{"id":"miroir-zfo","title":"P12.OP4 follow-up: Validate RRF merging quality with score-comparability benchmark","description":"## Context\n\nScore normalization research (miroir-zc2.4) found that raw _rankingScore merging gives Kendall τ = 0.79 vs ground truth — well below the 0.95 threshold. RRF merging is already implemented in merger.rs as the mitigation.\n\n## What\n\nRe-run the score-comparability benchmark using Miroir's actual RRF merger (instead of the score-based merge in simulate.py) and measure τ against ground truth. This validates that RRF solves the cross-shard comparability problem.\n\n## Steps\n1. Add an RRF merge mode to simulate.py (or write a Rust test that uses the actual merger)\n2. Re-run with the same 10K query set against the skewed corpus\n3. Measure Kendall τ between RRF-merged results and single-index ground truth\n4. If τ ≥ 0.95: close with note-of-no-action\n5. If τ < 0.95: investigate global-IDF preflight (plan §1 dfs_query_then_fetch pattern)\n\n## Acceptance\n- [ ] RRF merge benchmarked against ground truth\n- [ ] τ reported with 95% CI\n- [ ] If τ < 0.95: create bead for global-IDF preflight implementation","status":"closed","priority":2,"issue_type":"issue","assignee":"alpha","created_at":"2026-04-19T04:06:52.077073258Z","created_by":"coding","updated_at":"2026-04-19T07:00:18.651855450Z","closed_at":"2026-04-19T07:00:18.651747298Z","close_reason":"RRF validation complete: τ=0.14 (95% CI [0.134, 0.140]), well below 0.95 threshold. RRF performs worse than score-based merge (τ=0.79) on skewed corpus. Follow-up bead miroir-yio created for global-IDF preflight implementation.","source_repo":".","compaction_level":0,"original_size":0,"labels":["deferred"]}