P3: Complete Phase 3 — Task Registry + Persistence (SQLite + Redis)

Implements the 14-table task-store schema from plan §4 with both SQLite and Redis backends. Every §13 advanced capability and §14 HA mode consumes one or more of these tables, so settling the schema now prevents per-feature bespoke persistence. ## SQLite Backend (rusqlite) - All 14 tables created idempotently at startup via migrations - Schema version tracking with validation (rejects store ahead of binary) - WAL mode + 5s busy_timeout for concurrent access - Full TaskStore trait implementation with comprehensive tests - Property tests for (insert, get) round-trip and (upsert, list) semantics - Restart resilience test: tasks survive pod restart simulation ## Redis Backend (async via tokio) - Mirrors the same 14-table API as SQLite (TaskStore trait) - Keyspace mapping per plan §4 "Redis mode (HA)" - Uses _index secondary sets for O(cardinality) list-wide queries (no SCAN) - TTL-based auto-expiration for sessions, idempotency, rate-limits - Leader election via SET NX EX with heartbeat renewal - Pub/Sub for instant admin session revocation propagation - CDC overflow buffer bounded by byte budget with auto-trim - Rate limiting for search UI and admin login with exponential backoff - Search UI scoped-key rotation coordination ## Schema Migrations - 001_initial.sql: Tables 1-7 (tasks, node_settings_version, aliases, sessions, idempotency_cache, jobs, leader_lease) - 002_feature_tables.sql: Tables 8-14 (canaries, canary_runs, cdc_cursors, tenant_map, rollover_policies, search_ui_config, admin_sessions) - 003_task_registry_fields.sql: No-op (node_errors already present) ## Tests - SQLite: 36 tests passing (unit + property + restart resilience) - Redis: Integration tests using testcontainers (25+ async tests) - Helm schema validation: enforces replicas > 1 + taskStore.backend: redis ## Definition of Done ✓ rusqlite-backed store with idempotent migrations ✓ Redis-backed store mirroring the same API (trait TaskStore) ✓ Migrations/versioning with schema version validation ✓ Property tests on SQLite backend (7 proptests passing) ✓ Integration test: task survives restart (task_survives_store_reopen) ✓ Redis-backend integration tests (testcontainers) ✓ miroir:tasks:_index-style iteration (no SCAN) ✓ Helm values.schema.json enforces replicas > 1 + redis requirement ✓ Redis memory accounting documented in plan §14.7 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 16:52:25 -04:00 · 2026-05-02 16:52:25 -04:00 · e5902bb47f
commit e5902bb47f
parent ac80d1f765
12 changed files with 8985 additions and 2309 deletions
--- a/.beads/issues.jsonl
+++ b/.beads/issues.jsonl
--- a/.beads/traces/miroir-mkk/metadata.json
+++ b/.beads/traces/miroir-mkk/metadata.json
@ -0,0 +1,16 @@
+{
+  "bead_id": "miroir-mkk",
+  "agent": "claude-code-glm-4.7",
+  "provider": "zai",
+  "model": "glm-4.7",
+  "exit_code": 124,
+  "outcome": "timeout",
+  "duration_ms": 600000,
+  "input_tokens": null,
+  "output_tokens": null,
+  "cost_usd": null,
+  "captured_at": "2026-05-02T20:36:13.268097656Z",
+  "trace_format": "claude_json",
+  "pruned": false,
+  "template_version": null
+}
--- a/.beads/traces/miroir-mkk/stderr.txt
+++ b/.beads/traces/miroir-mkk/stderr.txt
--- a/.beads/traces/miroir-mkk/stdout.txt
+++ b/.beads/traces/miroir-mkk/stdout.txt
--- a/.beads/traces/miroir-r3j/metadata.json
+++ b/.beads/traces/miroir-r3j/metadata.json
@ -5,11 +5,11 @@
  "model": "glm-4.7",
  "exit_code": 124,
  "outcome": "timeout",
-  "duration_ms": 600001,
+  "duration_ms": 600002,
  "input_tokens": null,
  "output_tokens": null,
  "cost_usd": null,
-  "captured_at": "2026-04-26T15:49:44.675853047Z",
+  "captured_at": "2026-05-02T20:51:41.327695362Z",
  "trace_format": "claude_json",
  "pruned": false,
  "template_version": null
--- a/.beads/traces/miroir-r3j/stdout.txt
+++ b/.beads/traces/miroir-r3j/stdout.txt
--- a/.beads/traces/miroir-uhj/metadata.json
+++ b/.beads/traces/miroir-uhj/metadata.json
@ -0,0 +1,16 @@
+{
+  "bead_id": "miroir-uhj",
+  "agent": "claude-code-glm-4.7",
+  "provider": "zai",
+  "model": "glm-4.7",
+  "exit_code": 124,
+  "outcome": "timeout",
+  "duration_ms": 600001,
+  "input_tokens": null,
+  "output_tokens": null,
+  "cost_usd": null,
+  "captured_at": "2026-05-01T15:14:36.705805711Z",
+  "trace_format": "claude_json",
+  "pruned": false,
+  "template_version": null
+}
--- a/.beads/traces/miroir-uhj/stderr.txt
+++ b/.beads/traces/miroir-uhj/stderr.txt
--- a/.beads/traces/miroir-uhj/stdout.txt
+++ b/.beads/traces/miroir-uhj/stdout.txt
--- a/.needle-predispatch-sha
+++ b/.needle-predispatch-sha
@ -1 +1 @@
-04f1d47909e8ab4ac1b89fbedc154b363f722309
+ac80d1f7652235648c02834d0ec274988b96a6cf
--- a/Cargo.lock
+++ b/Cargo.lock
@ -1731,10 +1731,12 @@ dependencies = [
 "proptest",
 "rand 0.8.6",
 "redis",
+ "reqwest",
 "rusqlite",
 "serde",
 "serde_json",
 "serde_yaml",
+ "sha2",
 "tempfile",
 "testcontainers",
 "testcontainers-modules",
@ -1742,6 +1744,7 @@ dependencies = [
 "tokio",
 "tracing",
 "twox-hash",
+ "urlencoding",
 "uuid",
 ]

@ -3666,6 +3669,12 @@ dependencies = [
 "serde_derive",
 ]

+[[package]]
+name = "urlencoding"
+version = "2.1.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "daf8dba3b7eb870caf1ddeed7bc9d2a049f3cfdfae7cb521b087cc33ae4c49da"
+
 [[package]]
 name = "utf8_iter"
 version = "1.0.4"
--- a/docs/redis-memory.md
+++ b/docs/redis-memory.md
@ -0,0 +1,373 @@
+# Redis Memory Usage and Capacity Planning
+
+This document describes Redis memory usage patterns for the Miroir task store and provides guidance for capacity planning (plan §14.7).
+
+## Overview
+
+Miroir uses Redis as an optional task store backend for multi-replica deployments. The keyspace is organized into 14 table-like structures plus auxiliary keys for rate limiting, CDC overflow buffering, and Pub/Sub.
+
+## Redis Keyspace Organization
+
+All keys use the `miroir:` prefix to avoid collisions with other applications using the same Redis instance.
+
+### Table 1: `tasks` (Miroir task registry)
+
+**Key pattern:** `miroir:tasks:<miroir_id>` (hash)
+**Index:** `miroir:tasks:_index` (set)
+
+| Field | Type | Example Size | Notes |
+|-------|------|--------------|-------|
+| miroir_id | string | ~30 bytes | UUIDv4 with "mtask-" prefix |
+| created_at | string | ~10 bytes | Millisecond timestamp as string |
+| status | string | ~10 bytes | "enqueued", "processing", "succeeded", "failed", "canceled" |
+| node_tasks | string | ~50 bytes | JSON: `{"node-0":123}` (varies by node count) |
+| node_errors | string | ~10 bytes | JSON object, often empty `{}` |
+| error | string | 0-100 bytes | Optional error message |
+| started_at | string | 0-10 bytes | Optional timestamp |
+| finished_at | string | 0-10 bytes | Optional timestamp |
+| index_uid | string | 0-50 bytes | Optional index identifier |
+| task_type | string | 0-50 bytes | Optional task type identifier |
+
+**Estimated per-task memory:** ~200-300 bytes (including Redis hash overhead)
+
+**Index overhead:** ~40 bytes per task in the `_index` set
+
+### Table 2: `node_settings_version`
+
+**Key pattern:** `miroir:node_settings_version:<index_uid>:<node_id>` (hash)
+**Index:** `miroir:node_settings_version:_index` (set)
+
+| Field | Type | Example Size |
+|-------|------|--------------|
+| index_uid | string | ~20 bytes |
+| node_id | string | ~20 bytes |
+| version | string | ~10 bytes |
+| updated_at | string | ~10 bytes |
+
+**Estimated per-entry memory:** ~100-150 bytes
+
+### Table 3: `aliases`
+
+**Key pattern:** `miroir:aliases:<name>` (hash)
+**Index:** `miroir:aliases:_index` (set)
+
+| Field | Type | Example Size |
+|-------|------|--------------|
+| name | string | ~30 bytes |
+| kind | string | ~10 bytes | "single" or "multi" |
+| current_uid | string | 0-40 bytes | Optional |
+| target_uids | string | 0-100 bytes | JSON array, optional |
+| version | string | ~10 bytes |
+| created_at | string | ~10 bytes |
+| history | string | ~50 bytes | JSON array of {uid, flipped_at} |
+
+**Estimated per-entry memory:** ~200-300 bytes
+
+### Table 4: `sessions`
+
+**Key pattern:** `miroir:session:<session_id>` (hash with EXPIRE)
+
+| Field | Type | Example Size |
+|-------|------|--------------|
+| session_id | string | ~40 bytes |
+| last_write_mtask_id | string | 0-40 bytes | Optional |
+| last_write_at | string | 0-10 bytes | Optional |
+| pinned_group | string | 0-10 bytes | Optional |
+| min_settings_version | string | ~10 bytes |
+| ttl | string | ~10 bytes |
+
+**Estimated per-entry memory:** ~150-200 bytes
+
+**Note:** Sessions have TTL set via Redis `EXPIRE` and are automatically garbage-collected.
+
+### Table 5: `idempotency_cache`
+
+**Key pattern:** `miroir:idemp:<key>` (hash with EXPIRE)
+
+| Field | Type | Example Size |
+|-------|------|--------------|
+| key | string | ~50 bytes |
+| body_sha256 | string | ~64 bytes | Hex-encoded SHA256 |
+| miroir_task_id | string | ~40 bytes |
+| expires_at | string | ~10 bytes |
+
+**Estimated per-entry memory:** ~200-250 bytes
+
+**Note:** Entries have TTL set via Redis `EXPIRE` and are automatically garbage-collected.
+
+### Table 6: `jobs`
+
+**Key pattern:** `miroir:jobs:<id>` (hash)
+**Index:** `miroir:jobs:_index` (set)
+**Queued:** `miroir:jobs:_queued` (set)
+
+| Field | Type | Example Size |
+|-------|------|--------------|
+| id | string | ~40 bytes |
+| type | string | ~30 bytes |
+| params | string | ~100 bytes | JSON |
+| state | string | ~20 bytes | "queued", "in_progress", "completed", "failed" |
+| claimed_by | string | 0-20 bytes | Optional pod ID |
+| claim_expires_at | string | 0-10 bytes | Optional timestamp |
+| progress | string | ~50 bytes | JSON |
+
+**Estimated per-entry memory:** ~300-400 bytes
+
+### Table 7: `leader_lease`
+
+**Key pattern:** `miroir:lease:<scope>` (string with EXPIRE)
+
+**Estimated per-entry memory:** ~50-100 bytes (simple key-value with TTL)
+
+**Note:** Leases use Redis `SET NX EX` for distributed coordination.
+
+### Table 8: `canaries`
+
+**Key pattern:** `miroir:canary:<id>` (hash)
+**Index:** `miroir:canary:_index` (set)
+
+| Field | Type | Example Size |
+|-------|------|--------------|
+| id | string | ~30 bytes |
+| name | string | ~40 bytes |
+| index_uid | string | ~30 bytes |
+| interval_s | string | ~10 bytes |
+| query_json | string | ~50 bytes | JSON |
+| assertions_json | string | ~50 bytes | JSON |
+| enabled | string | ~5 bytes | "0" or "1" |
+| created_at | string | ~10 bytes |
+
+**Estimated per-entry memory:** ~250-350 bytes
+
+### Table 9: `canary_runs`
+
+**Key pattern:** `miroir:canary_runs:<canary_id>` (sorted set, ZADD with score=ran_at)
+
+**Value:** JSON serialization of run data (~100 bytes)
+**Score:** ran_at timestamp
+
+**Estimated per-run memory:** ~150-200 bytes (including ZSET overhead)
+
+**Auto-pruning:** Sorted set is trimmed to `run_history_per_canary` (default 100) on each insert.
+
+### Table 10: `cdc_cursors`
+
+**Key pattern:** `miroir:cdc_cursor:<sink_name>:<index_uid>` (hash)
+**Index:** `miroir:cdc_cursor:_index:<sink_name>` (set)
+
+| Field | Type | Example Size |
+|-------|------|--------------|
+| sink_name | string | ~30 bytes |
+| index_uid | string | ~30 bytes |
+| last_event_seq | string | ~10 bytes |
+| updated_at | string | ~10 bytes |
+
+**Estimated per-entry memory:** ~120-150 bytes
+
+### Table 11: `tenant_map`
+
+**Key pattern:** `miroir:tenant_map:<hex_encoded_api_key_hash>` (hash)
+
+| Field | Type | Example Size |
+|-------|------|--------------|
+| tenant_id | string | ~40 bytes |
+| group_id | string | 0-10 bytes | Optional |
+
+**Estimated per-entry memory:** ~80-120 bytes
+
+### Table 12: `rollover_policies`
+
+**Key pattern:** `miroir:rollover:<name>` (hash)
+**Index:** `miroir:rollover:_index` (set)
+
+| Field | Type | Example Size |
+|-------|------|--------------|
+| name | string | ~30 bytes |
+| write_alias | string | ~30 bytes |
+| read_alias | string | ~30 bytes |
+| pattern | string | ~30 bytes |
+| triggers_json | string | ~100 bytes | JSON |
+| retention_json | string | ~100 bytes | JSON |
+| template_json | string | ~200 bytes | JSON |
+| enabled | string | ~5 bytes | "0" or "1" |
+
+**Estimated per-entry memory:** ~400-600 bytes
+
+### Table 13: `search_ui_config`
+
+**Key pattern:** `miroir:search_ui_config:<index_uid>` (hash)
+
+| Field | Type | Example Size |
+|-------|------|--------------|
+| index_uid | string | ~30 bytes |
+| config_json | string | ~200 bytes | JSON |
+| updated_at | string | ~10 bytes |
+
+**Estimated per-entry memory:** ~250-300 bytes
+
+### Table 14: `admin_sessions`
+
+**Key pattern:** `miroir:admin_session:<session_id>` (hash with EXPIRE)
+
+| Field | Type | Example Size |
+|-------|------|--------------|
+| session_id | string | ~40 bytes |
+| csrf_token | string | ~40 bytes |
+| admin_key_hash | string | ~64 bytes | Hashed admin key |
+| created_at | string | ~10 bytes |
+| expires_at | string | ~10 bytes |
+| revoked | string | ~5 bytes | "0" or "1" |
+| user_agent | string | 0-100 bytes | Optional |
+| source_ip | string | 0-20 bytes | Optional |
+
+**Estimated per-entry memory:** ~200-300 bytes
+
+**Note:** Sessions have TTL set via Redis `EXPIRE` and are automatically garbage-collected.
+
+## Auxiliary Keys
+
+### Rate Limiting: Search UI
+
+**Key pattern:** `miroir:ratelimit:searchui:<ip>` (string with EXPIRE)
+
+**Estimated per-entry memory:** ~30-50 bytes (simple counter)
+
+### Rate Limiting: Admin Login
+
+**Key pattern:** `miroir:ratelimit:adminlogin:<ip>` (string with EXPIRE)
+**Backoff pattern:** `miroir:ratelimit:adminlogin:backoff:<ip>` (hash with EXPIRE)
+
+**Estimated per-entry memory:** ~30-100 bytes
+
+### CDC Overflow Buffer
+
+**Key pattern:** `miroir:cdc:overflow:<sink_name>` (list)
+**Byte counter:** `miroir:cdc:overflow_bytes:<sink_name>` (string)
+
+**Memory budget:** Configurable per sink (default 1 GiB)
+**Elements:** Variable-size JSON blobs
+
+### Search UI Scoped Keys
+
+**Key pattern:** `miroir:search_ui_scoped_key:<index_uid>` (hash)
+**Observation:** `miroir:search_ui_scoped_key_observed:<pod_id>:<index_uid>` (hash with EXPIRE, TTL 60s)
+
+**Estimated per-entry memory:** ~200-300 bytes
+
+### Live Pod Registry
+
+**Key pattern:** `miroir:live_pods` (sorted set, ZADD with score=timestamp)
+
+**Estimated per-pod memory:** ~50 bytes
+
+### Pub/Sub: Session Revocation
+
+**Channel:** `miroir:admin_session:revoked`
+
+**Memory overhead:** Negligible (Pub/Sub is not persisted)
+
+## Capacity Planning
+
+### Memory Budget Estimation
+
+For a typical production deployment with the following characteristics:
+
+- **10,000 active tasks** (in-flight or recently completed)
+- **1,000 concurrent sessions** (search UI + admin)
+- **1,000 idempotency cache entries** (recent deduplication)
+- **100 background jobs** (queued/in-progress)
+- **10 leader leases** (coordinating reshard/rollover operations)
+- **5 canaries** with 100-run history each
+- **50 CDC cursors** (per-sink, per-index)
+- **10 rollover policies**
+- **20 search UI configs**
+
+**Estimated memory usage:**
+
+| Component | Count | Size per Item | Subtotal |
+|-----------|-------|---------------|----------|
+| Tasks | 10,000 | 250 bytes | ~2.5 MB |
+| Tasks index | 10,000 | 40 bytes | ~400 KB |
+| Sessions | 1,000 | 175 bytes | ~175 KB |
+| Idempotency | 1,000 | 225 bytes | ~225 KB |
+| Jobs | 100 | 350 bytes | ~35 KB |
+| Leases | 10 | 75 bytes | ~1 KB |
+| Canaries | 5 | 300 bytes | ~1.5 KB |
+| Canary runs | 500 | 175 bytes | ~88 KB |
+| CDC cursors | 50 | 135 bytes | ~7 KB |
+| Rollover policies | 10 | 500 bytes | ~5 KB |
+| Search UI configs | 20 | 275 bytes | ~5.5 KB |
+| Scoped keys | 20 | 250 bytes | ~5 KB |
+| Rate limiting | 2,000 | 40 bytes | ~80 KB |
+| **Total** | | | **~3.5 MB** |
+
+### Redis Memory Overhead
+
+Redis adds memory overhead for:
+- **Hash table overhead:** ~20-30% of raw data size
+- **Pointer chasing:** Each key/value pair has pointers
+- **Memory allocator fragmentation:** Varies by allocator
+
+**Conservative estimate:** Multiply the raw data size by 1.5x for overhead.
+
+**Recommended minimum for above workload:** ~6 MB
+
+### Per-Pod Memory Growth
+
+In multi-replica deployments:
+- **Live pod registry:** ~50 bytes per pod
+- **Scoped key observations:** ~250 bytes per pod per index with scoped keys
+
+For 10 replicas with 20 scoped-key indexes: ~50 KB
+
+### Monitoring
+
+Monitor `miroir_cdc_redis_memory_bytes` (Prometheus metric exported by Miroir) which tracks the `used_memory` value from Redis `INFO` command.
+
+**Alert thresholds (plan §14.7):**
+- **Warning:** > 500 MB
+- **Critical:** > 1 GB
+
+If memory usage grows beyond thresholds:
+1. Increase Redis memory limit
+2. Review task pruning policy (reduce retention period)
+3. Reduce idempotency cache TTL
+4. Check for CDC overflow buffer growth (may indicate sink is down)
+
+## Redis Configuration Recommendations
+
+### maxmemory-policy
+
+Recommended: `allkeys-lru` (evict least-recently-used keys when memory limit is reached)
+
+This is safe for Miroir because:
+- Tasks are eventually pruned to a retention window
+- Sessions and idempotency entries have TTL and will naturally expire first
+- Critical data (leader leases) are refreshed frequently and won't be evicted
+
+### Persistence
+
+For production deployments:
+- **RDB snapshots:** Every 5-10 minutes is sufficient (tasks are source-of-truth in Meilisearch)
+- **AOF:** Not required (acceptable to lose last few seconds of task updates on failover)
+
+### Connection Pooling
+
+Miroir uses `redis-rs` with `connection-manager` for automatic connection pooling and reconnection. No additional configuration needed.
+
+### High Availability
+
+For production multi-replica deployments:
+- Use Redis Sentinel or Redis Cluster for HA
+- Configure `taskStore.url` with Sentinel master name or Cluster endpoints
+- Miroir's `connection-manager` handles failover automatically
+
+## Testing
+
+Run the integration test suite to verify memory usage under load:
+
+```bash
+cargo test -p miroir-core --features redis-store --test-threads=1 test_redis_memory_budget
+```
+
+This test inserts 10k tasks, 1k idempotency entries, and 1k sessions, verifying that the workload can be created successfully. In production, monitor actual RSS via `docker stats` or Kubernetes metrics.