miroir/crates
jedarden cff90a3ff1 P6.5: Mode C work-queued chunked jobs (plan §14.5)
Implement job chunking for dump import and reshard backfill with
claim TTL and heartbeat renewal for pod crash recovery.

Changes:
- jobs table (Phase 3) with states: queued | in_progress | completed | failed
- Atomic compare-and-swap job claiming (claimed_by IS NULL → claimed_by = pod_id)
- Claim TTL: 30s timeout with 10s heartbeat interval
- Large jobs split into chunks on input boundaries by first pod
- Per-chunk progress persisted for idempotent resume
- Queue depth metric (miroir_background_queue_depth) for HPA

Applied to:
- §13.9 streaming dump import — chunks on NDJSON line boundaries (256 MiB default)
- §13.1 reshard backfill — partitions by shard-id range

TaskStore implementations:
- SQLite: job CRUD with CAS claim, renewal, expired claim reclamation
- Redis: same with _queued set for O(1) queue depth (HPA metric)

Mode C coordinator:
- enqueue_job(), claim_job(), renew_claim(), split_job_into_chunks()
- reclaim_expired_claims() for pod crash recovery
- queue_depth() for HPA external metric

Mode C worker:
- Poll-and-claim loop with heartbeat renewal
- Chunking logic for dump import and reshard backfill
- Per-chunk processing with progress tracking

Acceptance tests:
- 1GB dump splits into 4× 256 MiB chunks
- Claim expires after 30s, another pod reclaims and resumes
- HPA on queue depth > 10 triggers scale-up
- Two concurrent dumps interleave chunks
- 3 pods claim chunks in parallel

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 06:11:12 -04:00
..
miroir-core P6.5: Mode C work-queued chunked jobs (plan §14.5) 2026-05-23 06:11:12 -04:00
miroir-ctl P3: Phase 3 Task Registry + Persistence — COMPLETE 2026-05-02 16:50:42 -04:00
miroir-proxy P6.4: Mode B leader-only singleton coordinator (plan §14.5) 2026-05-23 04:26:27 -04:00