# Structured JSON Logging Implementation (P7.5, §10) ## Overview Miroir uses `tracing-subscriber` with JSON output to produce structured logs that can be parsed by log aggregators (Loki, ElasticSearch, Splunk, CloudWatch). ## Implementation Location **Main initialization**: `crates/miroir-proxy/src/main.rs` (lines 284-320) **Middleware**: `crates/miroir-proxy/src/middleware.rs` (lines 1528-1635) **Tests**: `crates/miroir-proxy/src/middleware.rs` (lines 2721-2815) ## Configuration ```rust // main.rs let json_layer = tracing_subscriber::fmt::layer() .json() .flatten_event(true) .with_target(true) .with_current_span(true) .with_span_list(false); ``` ## Log Format Every log line is a JSON object with the following fields: ### Base fields (present on every log line) - `timestamp`: ISO 8601 datetime (automatic from tracing-subscriber) - `level`: One of `ERROR`, `WARN`, `INFO`, `DEBUG`, `TRACE` - `target`: Module path (e.g., `miroir.request`, `miroir.search_coalesced`) - `message`: Human-readable description - `pod_id`: From `POD_NAME` env var (global span field) ### Per-request fields - `request_id`: 8-character hex hash of UUIDv7 (from `X-Request-Id` header) ### Optional fields (context-specific) - `index`: Index name - `duration_ms`: Request duration in milliseconds - `node_count`: Number of nodes queried - `estimated_hits`: Search result count - `degraded`: Boolean indicating partial results ## Example Output ```json { "timestamp": "2026-05-01T12:00:00.000Z", "level": "info", "target": "miroir.request", "message": "GET /indexes/products/search 200", "pod_id": "miroir-7d9f8c4b5-x2kpq", "request_id": "deadbeef", "duration_ms": 42, "status": 200, "method": "GET", "path_template": "/indexes/{uid}/search" } ``` ## Request ID Propagation 1. `request_id_middleware` generates a UUIDv7, hashes it to 8 hex chars, and sets `X-Request-Id` header 2. `telemetry_middleware` reads the header and creates a tracing span with `request_id` field 3. All child log events inherit the `request_id` field via `with_current_span(true)` ## Log Levels - `ERROR`: Orchestrator-side internal failures - `WARN`: Degraded responses, fallbacks, soft failures - `INFO`: One line per request with summary fields - `DEBUG`: Per-node calls, per-sub-query in multi-search - `TRACE`: Fan-out buffer contents, scatter plan internals ## PII Audit The codebase has been audited to ensure no PII is logged: 1. **API keys**: Never logged. Only `key_hash` (SHA-256) appears in logs. 2. **Document content**: Never logged. Only metadata like `index_uid`, `primary_key`. 3. **User queries**: Never logged. Only `index` and `duration_ms` appear in search logs. 4. **Session IDs**: Truncated to 8-character prefix when logged (`session_prefix`). ## Acceptance Criteria - ✅ `jq` parses every log line (JSON layer configured) - ✅ `request_id` appears in logs (span field with `with_current_span(true)`) - ✅ No API keys, document fields, or user queries appear in logs (audit verified) - ✅ Log volume < 1 entry per client request at INFO level (telemetry_middleware logs once) ## Testing Unit tests verify: - JSON subscriber configuration compiles correctly - All log levels are available - Required fields are defined and compile Integration testing (manual) verifies: - Log output is valid JSON parseable by `jq` - `request_id` appears in every log line for a given request - No sensitive data appears in logs