Implement job chunking for dump import and reshard backfill with
claim TTL and heartbeat renewal for pod crash recovery.
Changes:
- jobs table (Phase 3) with states: queued | in_progress | completed | failed
- Atomic compare-and-swap job claiming (claimed_by IS NULL → claimed_by = pod_id)
- Claim TTL: 30s timeout with 10s heartbeat interval
- Large jobs split into chunks on input boundaries by first pod
- Per-chunk progress persisted for idempotent resume
- Queue depth metric (miroir_background_queue_depth) for HPA
Applied to:
- §13.9 streaming dump import — chunks on NDJSON line boundaries (256 MiB default)
- §13.1 reshard backfill — partitions by shard-id range
TaskStore implementations:
- SQLite: job CRUD with CAS claim, renewal, expired claim reclamation
- Redis: same with _queued set for O(1) queue depth (HPA metric)
Mode C coordinator:
- enqueue_job(), claim_job(), renew_claim(), split_job_into_chunks()
- reclaim_expired_claims() for pod crash recovery
- queue_depth() for HPA external metric
Mode C worker:
- Poll-and-claim loop with heartbeat renewal
- Chunking logic for dump import and reshard backfill
- Per-chunk processing with progress tracking
Acceptance tests:
- 1GB dump splits into 4× 256 MiB chunks
- Claim expires after 30s, another pod reclaims and resumes
- HPA on queue depth > 10 triggers scale-up
- Two concurrent dumps interleave chunks
- 3 pods claim chunks in parallel
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>