# Deployment Sizing Guide This guide provides a sizing matrix for provisioning Miroir orchestrator pods based on corpus size and query throughput. Meilisearch node sizing follows the guidance in [plan.md §6](../../plan/plan.md) independently. For per-feature scaling behavior (which capabilities need Redis, work queues, or nothing), see [Per-Feature Scaling Behavior](per-feature.md). ## Sizing matrix | Corpus | Peak QPS | Orchestrator pods | Task store | |--------|----------|-------------------|------------| | ≤ 10 GB | ≤ 500 | 2 (HA) | Redis (or SQLite if replicas=1) | | ≤ 50 GB | ≤ 2 k | 2–4 (HPA) | Redis | | ≤ 200 GB | ≤ 5 k | 4–8 (HPA) | Redis | | ≤ 1 TB | ≤ 20 k | 8–12 (HPA) | Redis | | ≤ 5 TB | ≤ 100 k | 12–24 (HPA) | Redis (clustered or Sentinel) | **Orchestrator pod specification:** 2 vCPU / 3.75 GB per pod. **Key insight:** Orchestrator count scales with *query throughput*, while Meilisearch node count scales with *corpus size* and replication factor. These are orthogonal axes. ## Task-store memory accounting When Redis is the task store, it backs shared state for multiple subsystems: - **Idempotency replay keys** — request deduplication cache - **Session pinning rows** — per-session sticky routing state - **Alias cache** — atomic index alias mappings - **Background job queue** — chunked dump import, reshard backfill jobs - **Leader lease** — HA coordinator election state - **CDC overflow buffer** — change-data-capture spill-over (when configured) - **Search UI rate-limit buckets** — IP-based rate limiter state Add **~20 MB for search UI rate-limit buckets per 10k active IPs** on top of the task-store baseline when `search_ui.rate_limit.backend: redis`. Bucket rows auto-expire per `redis_ttl_s`, keeping the footprint bounded even under IP-scan or spray attacks. ## Worked example: ≤ 200 GB / ≤ 5 k QPS **From the table:** 4–8 orchestrator pods (HPA-scaled between these limits), Redis task store. ### Memory budget validation Per [plan.md §14.2](../plan/plan.md#142-per-pod-memory-budget), steady-state per-pod memory at default config: | Component | Budget | |-----------|--------| | Baseline (runtime + pools) | ~330 MB | | Request/response buffers (p99 concurrent) | 200 MB | | Task registry + idempotency + sessions | 250 MB | | Feature overhead (§13.11–21 features) | ~200 MB | | Allocator headroom | ~800 MB | | **Steady-state total (idle background)** | **~1.2 GB** | | **With one heavy background job active** | **~1.7 GB** | At 4–8 pods, cluster-wide steady-state memory is 4.8–9.6 GB, which fits within a single Redis instance (allocate 4–8 GB for Redis at this tier). Heavy background jobs only run on the pod that claimed them from the shared queue — not all pods simultaneously — so the per-pod budget stays within envelope. ### Query throughput validation Per [plan.md §14.3](../../plan/plan.md#143-per-pod-cpu-budget): - **Small searches** (1 KB responses): ~3 kQPS per pod at 70% CPU - **Large searches** (10 KB responses): ~1 kQPS per pod at 70% CPU At 5 kQPS peak with 4–8 pods: - Each pod handles 625–1250 QPS - This is within the 1–3 kQPS per-pod envelope - HPA will scale up toward 8 pods during sustained peak ### Result The ≤ 200 GB / ≤ 5 k QPS row is validated: 4–8 orchestrator pods with Redis task store can handle the workload within the 2 vCPU / 3.75 GB per-pod resource envelope. ## When to escalate If your workload exceeds the ≤ 5 TB / ≤ 100 k QPS tier, or if you're hitting resource limits despite following this matrix, consider: 1. **Vertical scaling escape valve** — For constrained environments, a single larger pod (e.g., 4 vCPU / 8 GB) is supported but not recommended for production. See [plan.md §14.10](../../plan/plan.md#1410-vertical-scaling-escape-valve). 2. **Redis clustering** — At very high scale, consider Redis Cluster or Sentinel for the task store to improve availability and distribute memory pressure. 3. **Tune feature flags** — Disable unused capabilities (e.g., multi-search, vector search, shadow tee) to reclaim memory. Every feature in [plan.md §13](../../plan/plan.md#13-feature-scaling-modes) has an `enabled: true` knob that can be flipped off. 4. **Consult the metrics** — Use the resource-pressure metrics in [plan.md §14.9](../../plan/plan.md#149-resource-pressure-metrics-and-alerts) to identify the bottleneck (CPU, memory, request queue depth, or background job backlog).