Documents the completed P6.5 Mode C work-queued chunked jobs implementation. All acceptance tests pass; infrastructure fully functional per plan §14.5. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2.4 KiB
2.4 KiB
P6.5 Mode C: Work-Queued Chunked Jobs - Verification Summary
Task Completion Status
P6.5 Mode C work-queued chunked jobs (plan §14.5) is fully implemented and all acceptance tests pass.
Implementation Details (from commits 8b1cf42, cff90a3)
Core Components
-
mode_c_coordinator.rs - Job coordination with:
claim_job()- atomic compare-and-swap for job claimingrenew_claim()- heartbeat to extend claim TTLreclaim_expired_claims()- release claims from crashed podssplit_job_into_chunks()- chunk large jobs on input boundariesqueue_depth()- HPA metric support
-
mode_c_worker/mod.rs - Worker loop for processing:
- Poll for queued jobs and claim them
- Heartbeat to renew claims every 10s
- Process dump import chunks (NDJSON line boundaries)
- Process reshard backfill chunks (shard-id ranges)
- Handle idempotent resume from
last_cursor
-
dump_chunking.rs - Split NDJSON dumps on line boundaries (256 MiB default)
-
reshard_chunking.rs - Split reshard backfill by shard-id ranges
Database Schema
Migration 005_jobs_chunking.sql adds:
parent_job_id- Link chunks to parent jobchunk_index- Chunk position (0-based)total_chunks- Total number of chunkscreated_at- Job creation timestamp- Indexes for efficient queries
Acceptance Tests (22 tests pass)
- ✅ 1 GB dump splits into 4× 256 MiB chunks
- ✅ 3 pods claim chunks in parallel
- ✅ Claim expires in 30s; another pod resumes at last_cursor
- ✅ HPA queue depth metric drives scaling
- ✅ Two concurrent dumps interleave without starvation
- ✅ Reshard backfill splits by shard-id range
- ✅ Heartbeat renews claim; missed heartbeat expires
Configuration
dump_import:
chunk_size_bytes: 268435456 # 256 MiB per §14.5 Mode C chunk-parallel coordinator
HPA Integration
Queue depth metric: miroir_background_queue_depth (Prometheus GaugeVec with job_type label)
# Example HPA configuration
metrics:
- type: External
external:
metric:
name: miroir_background_queue_depth
target:
type: AverageValue
averageValue: 10
Verified
- All 22 Mode C acceptance tests pass
- Jobs table with states:
queued | in_progress | completed | failed - Claim TTL: 30s default, heartbeat every 10s
- Chunking on input boundaries (NDJSON lines for dump, shard-id for reshard)
- Per-chunk progress for idempotent resume
- Queue depth metric for HPA scaling