miroir/docs/search_ui_analytics_beacon.md
jedarden 7ea7d0ed52 feat(search-ui): add analytics beacon CDC integration tests and docs
Add comprehensive test coverage for the beacon → CDC pipeline:

Test file (p13_21_beacon_cdc_integration.rs):
- Beacon request structure validation (click, latency events)
- CDC manager stores analytics events correctly
- Analytics event serialization includes all fields
- Analytics events map to correct CDC operation types
- Beacon event_id is used for idempotency
- Config validation for analytics settings
- Session response structure validation

Documentation (docs/search_ui_analytics_beacon.md):
- Beacon endpoint specification and request schema
- Event types (click, latency, impression) and required fields
- Idempotency mechanism using event_id
- CDC integration details and event schema
- Configuration examples for enabling/disabling analytics
- Client integration examples (JavaScript)
- Security considerations and rate limiting
- Metrics and troubleshooting guide

This completes the beacon → CDC integration verification for plan §13.21.

Closes: bf-51eg8
2026-05-26 18:23:52 -04:00

8.1 KiB

Search UI Analytics Beacon (plan §13.21)

Overview

The Search UI Analytics Beacon is an HTTP endpoint that collects end-user interaction data from the embedded search UI. Beacon events are published to the CDC (Change Data Capture) system for downstream analytics processing.

Beacon Endpoint

URL: POST /_miroir/ui/search/{index}/beacon

Authentication: JWT session token (from /ui/search/{index}/session)

Request Body: JSON BeaconRequest

Response: 202 Accepted (event processed) or 403 Forbidden (invalid session)

Beacon Request Schema

{
  "event_id": "string",           // Client-generated unique identifier for idempotency
  "event_type": "string",         // Event type: "click", "latency", "impression"
  "index_uid": "string",          // Index UID
  "query": "string (optional)",   // Query string for search/latency events
  "result_count": "number (optional)", // Number of results for search events
  "latency_ms": "number (optional)",   // Latency in milliseconds
  "document_id": "string (optional)",  // Primary key of clicked document (for click events)
  "position": "number (optional)"      // Click position in results (for click events)
}

Event Types

Event Type Description Required Fields Optional Fields
click User clicked a search result document_id, position query, result_count
latency Search latency measurement latency_ms query, result_count
impression Search results impression result_count query

Idempotency

Beacon events are idempotent based on the event_id field:

  1. The orchestrator checks if event_id was already processed (Redis-backed)
  2. Duplicate events are rejected with 202 Accepted (no-op)
  3. Each unique event_id is processed exactly once

For clients that cannot generate event_id, the orchestrator generates a deterministic hash from:

  • Session ID
  • Query string
  • Document ID (if present)
  • Position (if present)
  • Minute-bucketed latency (for latency events)

CDC Integration

Publishing to CDC

When search_ui.analytics.enabled: true, beacon events are published to the CDC internal queue:

  1. Beacon event received at /ui/search/{index}/beacon
  2. Event converted to AnalyticsEvent with event_type mapped:
    • "click""click_through" in CDC
    • "latency""latency" in CDC
    • "impression""impression" in CDC
  3. Event stored in CDC internal queue with monotonically increasing sequence number
  4. Event available via GET /_miroir/changes?since={cursor}&index={uid}

CDC Event Schema

Analytics events in the CDC stream have this structure:

{
  "mtask_id": "analytics:{event_id}",      // Prefixed with "analytics:"
  "index": "products",                     // Index UID
  "operation": "ClickThrough",             // CdcOperation enum
  "primary_keys": ["doc-123"],             // Clicked document ID
  "shard_ids": [],                         // Empty for analytics events
  "settings_version": 0,                   // Not applicable
  "timestamp": 1234567890,                 // UNIX timestamp (ms)
  "document": {                            // Original analytics event
    "event_type": "click_through",
    "event_id": "evt-abc-123",
    "session_id": "session-xyz-789",
    "index": "products",
    "query": "laptop",
    "result_id": "doc-123",
    "result_position": 3,
    "latency_ms": null,
    "timestamp": 1234567890
  },
  "origin": null,                          // No origin for user events
  "event_id": "evt-abc-123"                // Stable deduplication key
}

CDC Operation Types

Operation Description When Emitted
ClickThrough User clicked a search result event_type: "click"
Latency Search latency measurement event_type: "latency"
Add Document added Document write
Update Document updated Document write
Delete Document deleted Document delete

CDC Configuration

Analytics events respect the cdc.emit_internal_writes configuration:

  • Click events (click_through): Always emitted (user-initiated)
  • Latency events (latency): Subject to emit_internal_writes flag
cdc:
  enabled: true
  emit_internal_writes: false  # Suppresses latency events if false
  sinks: []

Configuration

Enable Analytics

search_ui:
  enabled: true
  analytics:
    enabled: true              # Enable beacon collection
    sink: "cdc"                # Publish to CDC internal queue

Disable Analytics

search_ui:
  enabled: true
  analytics:
    enabled: false             # Disable beacon collection
    sink: "cdc"

Client Integration

JavaScript Example

// After obtaining JWT session token from /ui/search/{index}/session
const token = sessionStorage.getItem('miroir_search_token');

// Send click-through event
fetch(`/_miroir/ui/search/products/beacon`, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${token}`
  },
  body: JSON.stringify({
    event_id: crypto.randomUUID(),
    event_type: 'click',
    index_uid: 'products',
    query: 'laptop',
    document_id: 'prod-123',
    position: 3
  })
});

Latency Measurement

// Measure search latency
const searchStart = performance.now();

// Perform search...
const results = await search(query);

const searchEnd = performance.now();
const latencyMs = Math.round(searchEnd - searchStart);

// Send latency beacon
fetch(`/_miroir/ui/search/products/beacon`, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${token}`
  },
  body: JSON.stringify({
    event_id: crypto.randomUUID(),
    event_type: 'latency',
    index_uid: 'products',
    query: query,
    result_count: results.hits.length,
    latency_ms: latencyMs
  })
});

Metrics

The following Prometheus metrics are exposed for beacon events:

  • miroir_search_ui_sessions_total - Total sessions minted
  • miroir_search_ui_queries_total{index} - Total search queries
  • miroir_search_ui_zero_hits_total{index} - Total zero-result queries
  • miroir_search_ui_click_through_total{index} - Total click events (when analytics enabled)
  • miroir_search_ui_p95_ms{index} - P95 end-user latency

Rate Limiting

The beacon endpoint is rate-limited per source IP:

  • Default: 60 requests per minute
  • Backend: Redis (distributed) or local (per-pod)
  • Scope: Per-IP limit, not per-session

Rate limit configuration:

search_ui:
  rate_limit:
    per_ip: "60/minute"
    backend: "redis"           # or "local"
    redis_key_prefix: "miroir:ratelimit:searchui:"
    redis_ttl_s: 60

Security Considerations

  1. Authentication Required: Beacon endpoint requires valid JWT session token
  2. No PII in Events: Beacon events should not contain personally identifiable information
  3. Idempotency: Duplicate events are ignored to prevent replay attacks
  4. Origin Validation: Session token idx claim must match request index_uid
  5. Scope Validation: JWT scope claim must include beacon action

Troubleshooting

Events Not Appearing in CDC

  1. Check search_ui.analytics.enabled: true
  2. Verify CDC is enabled: cdc.enabled: true
  3. Check cdc.emit_internal_writes for latency events
  4. Verify JWT token includes beacon in scope
  5. Check browser console for network errors

Duplicate Events

If duplicate events appear in CDC:

  1. Verify event_id is unique per event
  2. Check Redis idempotency cache is functioning
  3. Ensure client is not retrying failed requests

High Cardinality

To reduce CDC stream cardinality:

  1. Avoid including unique user IDs in beacon events
  2. Use session IDs instead of user IDs
  3. Aggregate high-frequency events before sending

References

  • Plan §13.21: Default search interface (end-user search UI)
  • Plan §13.13: CDC (Change Data Capture)
  • crates/miroir-proxy/src/routes/search_ui.rs - Beacon endpoint implementation
  • crates/miroir-core/src/cdc.rs - CDC manager and analytics event types