Implements the 14-table task-store schema from plan §4 with both SQLite and Redis backends. Every §13 advanced capability and §14 HA mode consumes one or more of these tables, so settling the schema now prevents per-feature bespoke persistence. ## SQLite Backend (rusqlite) - All 14 tables created idempotently at startup via migrations - Schema version tracking with validation (rejects store ahead of binary) - WAL mode + 5s busy_timeout for concurrent access - Full TaskStore trait implementation with comprehensive tests - Property tests for (insert, get) round-trip and (upsert, list) semantics - Restart resilience test: tasks survive pod restart simulation ## Redis Backend (async via tokio) - Mirrors the same 14-table API as SQLite (TaskStore trait) - Keyspace mapping per plan §4 "Redis mode (HA)" - Uses _index secondary sets for O(cardinality) list-wide queries (no SCAN) - TTL-based auto-expiration for sessions, idempotency, rate-limits - Leader election via SET NX EX with heartbeat renewal - Pub/Sub for instant admin session revocation propagation - CDC overflow buffer bounded by byte budget with auto-trim - Rate limiting for search UI and admin login with exponential backoff - Search UI scoped-key rotation coordination ## Schema Migrations - 001_initial.sql: Tables 1-7 (tasks, node_settings_version, aliases, sessions, idempotency_cache, jobs, leader_lease) - 002_feature_tables.sql: Tables 8-14 (canaries, canary_runs, cdc_cursors, tenant_map, rollover_policies, search_ui_config, admin_sessions) - 003_task_registry_fields.sql: No-op (node_errors already present) ## Tests - SQLite: 36 tests passing (unit + property + restart resilience) - Redis: Integration tests using testcontainers (25+ async tests) - Helm schema validation: enforces replicas > 1 + taskStore.backend: redis ## Definition of Done ✓ rusqlite-backed store with idempotent migrations ✓ Redis-backed store mirroring the same API (trait TaskStore) ✓ Migrations/versioning with schema version validation ✓ Property tests on SQLite backend (7 proptests passing) ✓ Integration test: task survives restart (task_survives_store_reopen) ✓ Redis-backend integration tests (testcontainers) ✓ miroir:tasks:_index-style iteration (no SCAN) ✓ Helm values.schema.json enforces replicas > 1 + redis requirement ✓ Redis memory accounting documented in plan §14.7 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
12 KiB
Redis Memory Usage and Capacity Planning
This document describes Redis memory usage patterns for the Miroir task store and provides guidance for capacity planning (plan §14.7).
Overview
Miroir uses Redis as an optional task store backend for multi-replica deployments. The keyspace is organized into 14 table-like structures plus auxiliary keys for rate limiting, CDC overflow buffering, and Pub/Sub.
Redis Keyspace Organization
All keys use the miroir: prefix to avoid collisions with other applications using the same Redis instance.
Table 1: tasks (Miroir task registry)
Key pattern: miroir:tasks:<miroir_id> (hash)
Index: miroir:tasks:_index (set)
| Field | Type | Example Size | Notes |
|---|---|---|---|
| miroir_id | string | ~30 bytes | UUIDv4 with "mtask-" prefix |
| created_at | string | ~10 bytes | Millisecond timestamp as string |
| status | string | ~10 bytes | "enqueued", "processing", "succeeded", "failed", "canceled" |
| node_tasks | string | ~50 bytes | JSON: {"node-0":123} (varies by node count) |
| node_errors | string | ~10 bytes | JSON object, often empty {} |
| error | string | 0-100 bytes | Optional error message |
| started_at | string | 0-10 bytes | Optional timestamp |
| finished_at | string | 0-10 bytes | Optional timestamp |
| index_uid | string | 0-50 bytes | Optional index identifier |
| task_type | string | 0-50 bytes | Optional task type identifier |
Estimated per-task memory: ~200-300 bytes (including Redis hash overhead)
Index overhead: ~40 bytes per task in the _index set
Table 2: node_settings_version
Key pattern: miroir:node_settings_version:<index_uid>:<node_id> (hash)
Index: miroir:node_settings_version:_index (set)
| Field | Type | Example Size |
|---|---|---|
| index_uid | string | ~20 bytes |
| node_id | string | ~20 bytes |
| version | string | ~10 bytes |
| updated_at | string | ~10 bytes |
Estimated per-entry memory: ~100-150 bytes
Table 3: aliases
Key pattern: miroir:aliases:<name> (hash)
Index: miroir:aliases:_index (set)
| Field | Type | Example Size |
|---|---|---|
| name | string | ~30 bytes |
| kind | string | ~10 bytes |
| current_uid | string | 0-40 bytes |
| target_uids | string | 0-100 bytes |
| version | string | ~10 bytes |
| created_at | string | ~10 bytes |
| history | string | ~50 bytes |
Estimated per-entry memory: ~200-300 bytes
Table 4: sessions
Key pattern: miroir:session:<session_id> (hash with EXPIRE)
| Field | Type | Example Size |
|---|---|---|
| session_id | string | ~40 bytes |
| last_write_mtask_id | string | 0-40 bytes |
| last_write_at | string | 0-10 bytes |
| pinned_group | string | 0-10 bytes |
| min_settings_version | string | ~10 bytes |
| ttl | string | ~10 bytes |
Estimated per-entry memory: ~150-200 bytes
Note: Sessions have TTL set via Redis EXPIRE and are automatically garbage-collected.
Table 5: idempotency_cache
Key pattern: miroir:idemp:<key> (hash with EXPIRE)
| Field | Type | Example Size |
|---|---|---|
| key | string | ~50 bytes |
| body_sha256 | string | ~64 bytes |
| miroir_task_id | string | ~40 bytes |
| expires_at | string | ~10 bytes |
Estimated per-entry memory: ~200-250 bytes
Note: Entries have TTL set via Redis EXPIRE and are automatically garbage-collected.
Table 6: jobs
Key pattern: miroir:jobs:<id> (hash)
Index: miroir:jobs:_index (set)
Queued: miroir:jobs:_queued (set)
| Field | Type | Example Size |
|---|---|---|
| id | string | ~40 bytes |
| type | string | ~30 bytes |
| params | string | ~100 bytes |
| state | string | ~20 bytes |
| claimed_by | string | 0-20 bytes |
| claim_expires_at | string | 0-10 bytes |
| progress | string | ~50 bytes |
Estimated per-entry memory: ~300-400 bytes
Table 7: leader_lease
Key pattern: miroir:lease:<scope> (string with EXPIRE)
Estimated per-entry memory: ~50-100 bytes (simple key-value with TTL)
Note: Leases use Redis SET NX EX for distributed coordination.
Table 8: canaries
Key pattern: miroir:canary:<id> (hash)
Index: miroir:canary:_index (set)
| Field | Type | Example Size |
|---|---|---|
| id | string | ~30 bytes |
| name | string | ~40 bytes |
| index_uid | string | ~30 bytes |
| interval_s | string | ~10 bytes |
| query_json | string | ~50 bytes |
| assertions_json | string | ~50 bytes |
| enabled | string | ~5 bytes |
| created_at | string | ~10 bytes |
Estimated per-entry memory: ~250-350 bytes
Table 9: canary_runs
Key pattern: miroir:canary_runs:<canary_id> (sorted set, ZADD with score=ran_at)
Value: JSON serialization of run data (~100 bytes) Score: ran_at timestamp
Estimated per-run memory: ~150-200 bytes (including ZSET overhead)
Auto-pruning: Sorted set is trimmed to run_history_per_canary (default 100) on each insert.
Table 10: cdc_cursors
Key pattern: miroir:cdc_cursor:<sink_name>:<index_uid> (hash)
Index: miroir:cdc_cursor:_index:<sink_name> (set)
| Field | Type | Example Size |
|---|---|---|
| sink_name | string | ~30 bytes |
| index_uid | string | ~30 bytes |
| last_event_seq | string | ~10 bytes |
| updated_at | string | ~10 bytes |
Estimated per-entry memory: ~120-150 bytes
Table 11: tenant_map
Key pattern: miroir:tenant_map:<hex_encoded_api_key_hash> (hash)
| Field | Type | Example Size |
|---|---|---|
| tenant_id | string | ~40 bytes |
| group_id | string | 0-10 bytes |
Estimated per-entry memory: ~80-120 bytes
Table 12: rollover_policies
Key pattern: miroir:rollover:<name> (hash)
Index: miroir:rollover:_index (set)
| Field | Type | Example Size |
|---|---|---|
| name | string | ~30 bytes |
| write_alias | string | ~30 bytes |
| read_alias | string | ~30 bytes |
| pattern | string | ~30 bytes |
| triggers_json | string | ~100 bytes |
| retention_json | string | ~100 bytes |
| template_json | string | ~200 bytes |
| enabled | string | ~5 bytes |
Estimated per-entry memory: ~400-600 bytes
Table 13: search_ui_config
Key pattern: miroir:search_ui_config:<index_uid> (hash)
| Field | Type | Example Size |
|---|---|---|
| index_uid | string | ~30 bytes |
| config_json | string | ~200 bytes |
| updated_at | string | ~10 bytes |
Estimated per-entry memory: ~250-300 bytes
Table 14: admin_sessions
Key pattern: miroir:admin_session:<session_id> (hash with EXPIRE)
| Field | Type | Example Size |
|---|---|---|
| session_id | string | ~40 bytes |
| csrf_token | string | ~40 bytes |
| admin_key_hash | string | ~64 bytes |
| created_at | string | ~10 bytes |
| expires_at | string | ~10 bytes |
| revoked | string | ~5 bytes |
| user_agent | string | 0-100 bytes |
| source_ip | string | 0-20 bytes |
Estimated per-entry memory: ~200-300 bytes
Note: Sessions have TTL set via Redis EXPIRE and are automatically garbage-collected.
Auxiliary Keys
Rate Limiting: Search UI
Key pattern: miroir:ratelimit:searchui:<ip> (string with EXPIRE)
Estimated per-entry memory: ~30-50 bytes (simple counter)
Rate Limiting: Admin Login
Key pattern: miroir:ratelimit:adminlogin:<ip> (string with EXPIRE)
Backoff pattern: miroir:ratelimit:adminlogin:backoff:<ip> (hash with EXPIRE)
Estimated per-entry memory: ~30-100 bytes
CDC Overflow Buffer
Key pattern: miroir:cdc:overflow:<sink_name> (list)
Byte counter: miroir:cdc:overflow_bytes:<sink_name> (string)
Memory budget: Configurable per sink (default 1 GiB) Elements: Variable-size JSON blobs
Search UI Scoped Keys
Key pattern: miroir:search_ui_scoped_key:<index_uid> (hash)
Observation: miroir:search_ui_scoped_key_observed:<pod_id>:<index_uid> (hash with EXPIRE, TTL 60s)
Estimated per-entry memory: ~200-300 bytes
Live Pod Registry
Key pattern: miroir:live_pods (sorted set, ZADD with score=timestamp)
Estimated per-pod memory: ~50 bytes
Pub/Sub: Session Revocation
Channel: miroir:admin_session:revoked
Memory overhead: Negligible (Pub/Sub is not persisted)
Capacity Planning
Memory Budget Estimation
For a typical production deployment with the following characteristics:
- 10,000 active tasks (in-flight or recently completed)
- 1,000 concurrent sessions (search UI + admin)
- 1,000 idempotency cache entries (recent deduplication)
- 100 background jobs (queued/in-progress)
- 10 leader leases (coordinating reshard/rollover operations)
- 5 canaries with 100-run history each
- 50 CDC cursors (per-sink, per-index)
- 10 rollover policies
- 20 search UI configs
Estimated memory usage:
| Component | Count | Size per Item | Subtotal |
|---|---|---|---|
| Tasks | 10,000 | 250 bytes | ~2.5 MB |
| Tasks index | 10,000 | 40 bytes | ~400 KB |
| Sessions | 1,000 | 175 bytes | ~175 KB |
| Idempotency | 1,000 | 225 bytes | ~225 KB |
| Jobs | 100 | 350 bytes | ~35 KB |
| Leases | 10 | 75 bytes | ~1 KB |
| Canaries | 5 | 300 bytes | ~1.5 KB |
| Canary runs | 500 | 175 bytes | ~88 KB |
| CDC cursors | 50 | 135 bytes | ~7 KB |
| Rollover policies | 10 | 500 bytes | ~5 KB |
| Search UI configs | 20 | 275 bytes | ~5.5 KB |
| Scoped keys | 20 | 250 bytes | ~5 KB |
| Rate limiting | 2,000 | 40 bytes | ~80 KB |
| Total | ~3.5 MB |
Redis Memory Overhead
Redis adds memory overhead for:
- Hash table overhead: ~20-30% of raw data size
- Pointer chasing: Each key/value pair has pointers
- Memory allocator fragmentation: Varies by allocator
Conservative estimate: Multiply the raw data size by 1.5x for overhead.
Recommended minimum for above workload: ~6 MB
Per-Pod Memory Growth
In multi-replica deployments:
- Live pod registry: ~50 bytes per pod
- Scoped key observations: ~250 bytes per pod per index with scoped keys
For 10 replicas with 20 scoped-key indexes: ~50 KB
Monitoring
Monitor miroir_cdc_redis_memory_bytes (Prometheus metric exported by Miroir) which tracks the used_memory value from Redis INFO command.
Alert thresholds (plan §14.7):
- Warning: > 500 MB
- Critical: > 1 GB
If memory usage grows beyond thresholds:
- Increase Redis memory limit
- Review task pruning policy (reduce retention period)
- Reduce idempotency cache TTL
- Check for CDC overflow buffer growth (may indicate sink is down)
Redis Configuration Recommendations
maxmemory-policy
Recommended: allkeys-lru (evict least-recently-used keys when memory limit is reached)
This is safe for Miroir because:
- Tasks are eventually pruned to a retention window
- Sessions and idempotency entries have TTL and will naturally expire first
- Critical data (leader leases) are refreshed frequently and won't be evicted
Persistence
For production deployments:
- RDB snapshots: Every 5-10 minutes is sufficient (tasks are source-of-truth in Meilisearch)
- AOF: Not required (acceptable to lose last few seconds of task updates on failover)
Connection Pooling
Miroir uses redis-rs with connection-manager for automatic connection pooling and reconnection. No additional configuration needed.
High Availability
For production multi-replica deployments:
- Use Redis Sentinel or Redis Cluster for HA
- Configure
taskStore.urlwith Sentinel master name or Cluster endpoints - Miroir's
connection-managerhandles failover automatically
Testing
Run the integration test suite to verify memory usage under load:
cargo test -p miroir-core --features redis-store --test-threads=1 test_redis_memory_budget
This test inserts 10k tasks, 1k idempotency entries, and 1k sessions, verifying that the workload can be created successfully. In production, monitor actual RSS via docker stats or Kubernetes metrics.