miroir/docs/plan/REDIS_MEMORY_ACCOUNTING.md
jedarden 04f1d47909 P3.3.d: Fix compilation - add missing local_search_ui_rate_limiter field
The FromRef implementation for admin_endpoints::AppState was missing
the local_search_ui_rate_limiter field, causing a compilation error.

This completes P3.3.d Redis backend extras, which were already fully
implemented:
- Rate-limit keys with EXPIRE (miroir:ratelimit:searchui:<ip>,
  miroir:ratelimit:adminlogin:<ip>, miroir:ratelimit:adminlogin:backoff:<ip>)
- Scoped-key coordination (miroir:search_ui_scoped_key:<index>,
  miroir:search_ui_scoped_key_observed:<pod>:<index> with EXPIRE 60s)
- Pub/Sub for admin session revocation (miroir:admin_session:revoked)
- CDC overflow buffer (miroir:cdc:overflow:<sink> with LPUSH + LTRIM)

All acceptance criteria verified by existing tests:
- test_redis_rate_limit_searchui verifies EXPIRE is set
- test_redis_pubsub_session_invalidation verifies <100ms propagation
- test_redis_cdc_overflow verifies LLEN matches bytes published

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 11:18:02 -04:00

11 KiB
Raw Blame History

Redis Memory Accounting (Plan §14.7)

This document describes the Redis memory usage for Miroir's task store and related keyspace, providing operators with guidance for sizing Redis deployments.

Overview

Miroir uses Redis as the backing store for all shared state when taskStore.backend: redis. The 14 SQLite tables from plan §4 are mapped to Redis keyspaces with the same semantics. Additionally, several Redis-specific keys are used for rate limiting, CDC overflow buffering, and scoped-key rotation coordination.

Core Task Store Keyspace (14 tables)

Table 1: tasks

  • Key pattern: miroir:tasks:<miroir_id> (hash)
  • Index: miroir:tasks:_index (set of all task IDs)
  • Memory per task: ~500 bytes (typical)
    • miroir_id: 36 bytes (UUID format)
    • created_at, started_at, finished_at: 8 bytes each (i64)
    • status: 16 bytes
    • node_tasks (JSON): ~200 bytes (depends on node count)
    • node_errors (JSON): ~100 bytes (typical)
    • error: ~50 bytes (when present)
    • Overhead: ~100 bytes
  • Sizing: For 10,000 concurrent tasks → ~5 MB
  • Growth: Proportional to active task count; pruned by task_pruner

Table 2: node_settings_version

  • Key pattern: miroir:node_settings_version:<index_uid>:<node_id> (hash)
  • Index: miroir:node_settings_version:_index (set)
  • Memory per entry: ~150 bytes
    • index_uid: 32 bytes
    • node_id: 32 bytes
    • version: 8 bytes (i64)
    • updated_at: 8 bytes (i64)
    • Overhead: ~70 bytes
  • Sizing: For 100 indexes × 10 nodes = 1,000 entries → ~150 KB
  • Growth: Fixed per (index, node) pair; bounded

Table 3: aliases

  • Key pattern: miroir:aliases:<name> (hash)
  • Index: miroir:aliases:_index (set)
  • Memory per alias: ~400 bytes
    • name: 32 bytes
    • kind: 8 bytes
    • current_uid: 32 bytes
    • target_uids (JSON array): ~100 bytes
    • version: 8 bytes
    • created_at: 8 bytes
    • history (JSON array): ~200 bytes (depends on retention)
    • Overhead: ~12 bytes
  • Sizing: For 100 aliases → ~40 KB
  • Growth: Fixed; only grows when aliases are created

Table 4: sessions

  • Key pattern: miroir:session:<session_id> (hash)
  • Memory per session: ~200 bytes
    • session_id: 36 bytes
    • last_write_mtask_id: 36 bytes
    • last_write_at: 8 bytes
    • pinned_group: 8 bytes
    • min_settings_version: 8 bytes
    • ttl: 8 bytes
    • Overhead: ~100 bytes
  • TTL: Configurable (default: 300 seconds)
  • Sizing: For 1,000 active sessions → ~200 KB
  • Growth: Bounded by session_pinning.max_sessions

Table 5: idempotency_cache

  • Key pattern: miroir:idemp:<key> (hash)
  • Memory per entry: ~150 bytes
    • key: 64 bytes (SHA256 of request)
    • body_sha256: 64 bytes (hex-encoded)
    • miroir_task_id: 36 bytes
    • expires_at: 8 bytes
    • Overhead: ~10 bytes
  • TTL: Configurable (default: 60 seconds)
  • Sizing: For 10,000 cached requests → ~1.5 MB
  • Growth: Bounded by idempotency.max_cached_keys

Table 6: jobs

  • Key pattern: miroir:jobs:<id> (hash)
  • Index: miroir:jobs:_index (set of all job IDs)
  • Queue: miroir:jobs:_queued (set of queued job IDs for HPA signal)
  • Memory per job: ~300 bytes
    • id: 36 bytes
    • type: 16 bytes
    • params (JSON): ~100 bytes
    • state: 16 bytes
    • claimed_by: 32 bytes
    • claim_expires_at: 8 bytes
    • progress (JSON): ~50 bytes
    • Overhead: ~50 bytes
  • Sizing: For 1,000 jobs → ~300 KB
  • Growth: Proportional to background workload

Table 7: leader_lease

  • Key pattern: miroir:lease:<scope> (string)
  • Memory per lease: ~100 bytes
    • Value: holder pod ID (~32 bytes)
    • TTL metadata: ~68 bytes
  • TTL: 10 seconds (renewed every 3 seconds)
  • Sizing: For 10 concurrent scopes → ~1 KB
  • Growth: Fixed; one per coordinated operation

Table 8: canaries

  • Key pattern: miroir:canary:<id> (hash)
  • Index: miroir:canary:_index (set)
  • Memory per canary: ~600 bytes
    • id: 32 bytes
    • name: 32 bytes
    • index_uid: 32 bytes
    • interval_s: 8 bytes
    • query_json: ~200 bytes
    • assertions_json: ~200 bytes
    • enabled: 1 byte
    • created_at: 8 bytes
    • Overhead: ~90 bytes
  • Sizing: For 50 canaries → ~30 KB
  • Growth: Fixed; only grows when canaries are created

Table 9: canary_runs

  • Key pattern: miroir:canary_runs:<canary_id> (sorted set)
  • Memory per run: ~300 bytes
    • Score: 8 bytes (ran_at timestamp)
    • Value (JSON): ~250 bytes
    • Overhead: ~50 bytes
  • Retention: Configurable (default: 100 runs per canary)
  • Sizing: For 50 canaries × 100 runs = 5,000 runs → ~1.5 MB
  • Growth: Bounded by canary_runner.run_history_per_canary

Table 10: cdc_cursors

  • Key pattern: miroir:cdc_cursor:<sink_name>:<index_uid> (string)
  • Index: miroir:cdc_cursor:_index:<sink_name> (set)
  • Memory per cursor: ~100 bytes
    • Value: last_event_seq (8 bytes as string)
    • Overhead: ~92 bytes
  • Sizing: For 10 sinks × 100 indexes = 1,000 cursors → ~100 KB
  • Growth: Fixed per (sink, index) pair

Table 11: tenant_map

  • Key pattern: miroir:tenant_map:<hex_sha256> (hash)
  • Memory per mapping: ~150 bytes
    • tenant_id: 32 bytes
    • group_id: 8 bytes (when present)
    • Overhead: ~110 bytes
  • Sizing: For 1,000 tenants → ~150 KB
  • Growth: Fixed; one per API key

Table 12: rollover_policies

  • Key pattern: miroir:rollover:<name> (hash)
  • Index: miroir:rollover:_index (set)
  • Memory per policy: ~800 bytes
    • name: 32 bytes
    • write_alias: 32 bytes
    • read_alias: 32 bytes
    • pattern: 64 bytes
    • triggers_json: ~200 bytes
    • retention_json: ~200 bytes
    • template_json: ~200 bytes
    • enabled: 1 byte
    • Overhead: ~40 bytes
  • Sizing: For 20 policies → ~16 KB
  • Growth: Fixed; only grows when policies are created

Table 13: search_ui_config

  • Key pattern: miroir:search_ui_config:<index_uid> (hash)
  • Memory per config: ~400 bytes
    • index_uid: 32 bytes
    • config_json: ~300 bytes
    • updated_at: 8 bytes
    • Overhead: ~60 bytes
  • Sizing: For 50 indexes → ~20 KB
  • Growth: Fixed; one per index

Table 14: admin_sessions

  • Key pattern: miroir:admin_session:<session_id> (hash)
  • Memory per session: ~300 bytes
    • session_id: 36 bytes
    • csrf_token: 32 bytes
    • admin_key_hash: 64 bytes
    • created_at: 8 bytes
    • expires_at: 8 bytes
    • revoked: 1 byte
    • user_agent: ~50 bytes
    • source_ip: ~16 bytes
    • Overhead: ~85 bytes
  • TTL: Configurable (default: 8 hours)
  • Sizing: For 100 active admin sessions → ~30 KB
  • Growth: Bounded by concurrent admin users

Redis-Specific Keys

Rate Limiting: Search UI

  • Key pattern: miroir:ratelimit:searchui:<ip> (string counter)
  • Memory per bucket: ~100 bytes
    • Value: counter (8 bytes as string)
    • TTL metadata: ~92 bytes
  • TTL: 60 seconds (configurable via search_ui.rate_limit.redis_ttl_s)
  • Sizing: ~20 MB per 10,000 active IPs
    • Each active IP creates one bucket per 60-second window
    • Buckets auto-expire; steady state is proportional to active IP count
  • Growth: Proportional to unique client IPs in the rate limit window

Rate Limiting: Admin Login

  • Key pattern: miroir:ratelimit:adminlogin:<ip> (string counter)
  • Backoff: miroir:ratelimit:adminlogin:backoff:<ip> (hash)
  • Memory per IP: ~300 bytes (when in backoff)
  • TTL: Configurable (default: 300 seconds)
  • Sizing: For 1,000 IPs with failed attempts → ~300 KB
  • Growth: Bounded; only IPs with failed login attempts consume memory

Scoped Key Rotation

  • Current key: miroir:search_ui_scoped_key:<index> (hash)
  • Observations: miroir:search_ui_scoped_key_observed:<pod>:<index> (hash)
  • Memory per index: ~400 bytes (current) + ~100 bytes per pod (observations)
  • TTL: Observations expire after 60 seconds
  • Sizing: For 50 indexes × 10 pods = 500 observations → ~50 KB (transient)
  • Growth: Fixed per index; observations are transient

Live Pod Registry

  • Key: miroir:live_pods (sorted set)
  • Memory per pod: ~100 bytes
    • Member: pod ID (~32 bytes)
    • Score: timestamp (8 bytes)
    • Overhead: ~60 bytes
  • TTL: 300 seconds (auto-refreshed)
  • Sizing: For 24 pods → ~2.4 KB
  • Growth: Fixed; proportional to pod count

CDC Overflow Buffer

  • Key pattern: miroir:cdc:overflow:<sink_name> (list)
  • Byte counter: miroir:cdc:overflow_bytes:<sink_name> (string)
  • Memory: Configurable via cdc.buffer.redis_bytes (default: 1 GiB per sink)
  • Sizing: Per-sink budget; 1 GiB × 10 sinks = 10 GiB (worst case)
  • Growth: Bounded by configuration; trimmed to budget

Admin Session Revocation Pub/Sub

  • Channel: miroir:admin_session:revoked (Pub/Sub)
  • Memory: Negligible (Pub/Sub is connection-based, not stored)

Total Memory Calculation

Baseline (small deployment)

  • 100 indexes, 10 nodes, 100 concurrent tasks, 10 active IPs
  • Core tables: ~10 MB
  • Rate limiting: ~20 KB (10 IPs)
  • CDC overflow: 1 GiB (if enabled)
  • Total (without CDC): ~10 MB
  • Total (with CDC): ~1 GB

Medium deployment

  • 500 indexes, 50 nodes, 1,000 concurrent tasks, 1,000 active IPs
  • Core tables: ~50 MB
  • Rate limiting: ~20 MB (1,000 IPs)
  • CDC overflow: 1 GiB per sink × 5 sinks = 5 GiB
  • Total (without CDC): ~70 MB
  • Total (with CDC): ~5 GB

Large deployment

  • 1,000 indexes, 100 nodes, 10,000 concurrent tasks, 10,000 active IPs
  • Core tables: ~100 MB
  • Rate limiting: ~20 MB (10,000 IPs)
  • CDC overflow: 1 GiB per sink × 10 sinks = 10 GiB
  • Total (without CDC): ~120 MB
  • Total (with CDC): ~10 GB

Redis Sizing Recommendations

Minimum (development/testing)

  • Memory: 256 MB
  • Use case: Single replica, low traffic, no CDC

Small (production)

  • Memory: 512 MB - 1 GB
  • Use case: 2 replicas, ≤500 QPS, CDC with 1-2 sinks

Medium (production)

  • Memory: 2 - 4 GB
  • Use case: 2-4 replicas, ≤2k QPS, CDC with multiple sinks

Large (production)

  • Memory: 8 - 16 GB
  • Use case: 4-12 replicas, ≤20k QPS, heavy CDC usage

Very Large (production)

  • Memory: 32 GB+
  • Use case: 12+ replicas, high CDC throughput
  • Consider: Redis Cluster or Sentinel for HA

Monitoring

Monitor these Redis metrics to ensure adequate sizing:

  1. used_memory - Total memory used by Redis
  2. used_memory_peak - Peak memory usage
  3. evicted_keys - Number of keys evicted due to memory pressure (should be 0)
  4. miroir_task_count - Number of tasks in the registry
  5. miroir_task_registry_size - Prometheus gauge for task count

Alert when used_memory exceeds 80% of maxmemory.

References

  • Plan §4: Task registry + persistence schema
  • Plan §13: Advanced capabilities (CDC, rate limiting, etc.)
  • Plan §14.7: Revised deployment sizing matrix