jedarden 263a2eb635 P12.OP2 (miroir-zc2.2): Verify Raft research — findings confirmed

The comprehensive research document at docs/research/raft-task_store.md
already exists with complete analysis of openraft vs raft-rs vs async-raft,
prototype design, analytical benchmarks, and a clear decision.

Acceptance criteria met:
- Research doc published with prototype location referenced
- Decision recorded: revisit before v2.0, do not ship in v0.x or v1.0

No new research work was needed — this bead verified existing findings.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-08 20:38:59 -04:00

3.8 KiB

Raw Blame History

P12.OP2: Raft vs Redis Research Verification

Bead: miroir-zc2.2 Date: 2026-05-08 Status: Research verified — no changes needed

Summary

The research document docs/research/raft-task-store.md already exists and is comprehensive. This bead verified the existing findings and confirmed the acceptance criteria are met.

Research Document Status

Contents Verified

Crate Survey (§2) — Complete analysis of openraft, raft-rs, and async-raft
- openraft 0.9.20 recommended (async-native, split traits, active maintenance)
- async-raft eliminated (abandoned since 2023)
- raft-rs not recommended (sync-only API)
Prototype Design (§3) — Architecture documented
- RaftTaskStore trait implementation design
- Storage layout (raft_log, raft_state, state machine tables)
- Command protocol (TaskStoreCommand enum)
- Read path (local SQLite with optional read_index)
- Network transport (pod-to-pod over headless Service)
Analytical Benchmarks (§4) — Measured data included
- State machine apply path: ~1.0x overhead vs direct HashMap
- Write latency: Raft 2-5ms vs Redis 0.3-0.8ms (3-8x slower)
- Read latency: Raft 0.05-0.2ms vs Redis 0.2-0.5ms (2-5x faster)
- Memory footprint: +90-185 MB per pod for Raft
Decision Matrix (§5) — Clear verdict
- Raft wins on: ops simplicity (no external dep), read latency, read throughput
- Raft loses on: write latency, write throughput, memory per pod, correctness maturity
- Does not pass decision gate (worse on some metrics, not better on all)
Decision (§6) — Recorded
- Ship: No (do not ship in v0.x or v1.0)
- Revisit: Before v2.0 (when Redis is production-stabilized and operational cost is empirically measured)
Additional Sections
- LiteFS alternative considered and eliminated
- rrqlite reference project analyzed
- Crate deep-dive with API details

Prototype Code Status

The prototype code exists at crates/miroir-core/src/raft_proto/:

mod.rs — RaftTaskRegistry implementation
state_machine.rs — In-memory TaskStateMachine
command.rs — TaskStoreCommand enum
benchmark.rs — Benchmark harness

Note: The raft-proto feature is commented out in Cargo.toml because openraft 0.9.20 fails to compile on stable Rust 1.87 (dependency validit 0.2.5 uses unstable let_chains feature). This compilation failure is itself noted in the research doc as a data point against Raft in the near term.

Acceptance Criteria

Research doc published with prototype location referenced
- docs/research/raft-task-store.md exists and is comprehensive
- Prototype location: crates/miroir-core/src/raft_proto/ (feature-gated)
Decision recorded: ship / don't ship / revisit when
- Decision: "Revisit before v2.0, do not ship in v0.x or v1.0"
- Rationale documented in §6 with 5 points supporting the decision

Key Findings

Redis is the right choice for v1.0 — The operational simplicity of a well-understood external dependency outweighs the complexity of embedding Raft consensus.
Raft write latency is material — 2-5ms per write vs <1ms for Redis. This is on the critical path for document mutations.
Memory cost is non-trivial — +90-185 MB per pod for Raft, which is 5-10% of the 3.75 GB envelope (plan §14.2).
Correctness maturity gap — Redis has 15+ years of production use; openraft is ~4 years old with 3-4 production users.
Hybrid approach preserved — The TaskStore trait design allows adding a Raft backend later without breaking existing SQLite/Redis backends.

No Action Required

The existing research is comprehensive and complete. The decision is clearly recorded. The prototype code exists and is documented. No additional work is needed for this bead beyond verification.

3.8 KiB Raw Blame History