Add comprehensive test coverage for the beacon → CDC pipeline: Test file (p13_21_beacon_cdc_integration.rs): - Beacon request structure validation (click, latency events) - CDC manager stores analytics events correctly - Analytics event serialization includes all fields - Analytics events map to correct CDC operation types - Beacon event_id is used for idempotency - Config validation for analytics settings - Session response structure validation Documentation (docs/search_ui_analytics_beacon.md): - Beacon endpoint specification and request schema - Event types (click, latency, impression) and required fields - Idempotency mechanism using event_id - CDC integration details and event schema - Configuration examples for enabling/disabling analytics - Client integration examples (JavaScript) - Security considerations and rate limiting - Metrics and troubleshooting guide This completes the beacon → CDC integration verification for plan §13.21. Closes: bf-51eg8
8.1 KiB
Search UI Analytics Beacon (plan §13.21)
Overview
The Search UI Analytics Beacon is an HTTP endpoint that collects end-user interaction data from the embedded search UI. Beacon events are published to the CDC (Change Data Capture) system for downstream analytics processing.
Beacon Endpoint
URL: POST /_miroir/ui/search/{index}/beacon
Authentication: JWT session token (from /ui/search/{index}/session)
Request Body: JSON BeaconRequest
Response: 202 Accepted (event processed) or 403 Forbidden (invalid session)
Beacon Request Schema
{
"event_id": "string", // Client-generated unique identifier for idempotency
"event_type": "string", // Event type: "click", "latency", "impression"
"index_uid": "string", // Index UID
"query": "string (optional)", // Query string for search/latency events
"result_count": "number (optional)", // Number of results for search events
"latency_ms": "number (optional)", // Latency in milliseconds
"document_id": "string (optional)", // Primary key of clicked document (for click events)
"position": "number (optional)" // Click position in results (for click events)
}
Event Types
| Event Type | Description | Required Fields | Optional Fields |
|---|---|---|---|
click |
User clicked a search result | document_id, position |
query, result_count |
latency |
Search latency measurement | latency_ms |
query, result_count |
impression |
Search results impression | result_count |
query |
Idempotency
Beacon events are idempotent based on the event_id field:
- The orchestrator checks if
event_idwas already processed (Redis-backed) - Duplicate events are rejected with
202 Accepted(no-op) - Each unique
event_idis processed exactly once
For clients that cannot generate event_id, the orchestrator generates a deterministic hash from:
- Session ID
- Query string
- Document ID (if present)
- Position (if present)
- Minute-bucketed latency (for latency events)
CDC Integration
Publishing to CDC
When search_ui.analytics.enabled: true, beacon events are published to the CDC internal queue:
- Beacon event received at
/ui/search/{index}/beacon - Event converted to
AnalyticsEventwithevent_typemapped:"click"→"click_through"in CDC"latency"→"latency"in CDC"impression"→"impression"in CDC
- Event stored in CDC internal queue with monotonically increasing sequence number
- Event available via
GET /_miroir/changes?since={cursor}&index={uid}
CDC Event Schema
Analytics events in the CDC stream have this structure:
{
"mtask_id": "analytics:{event_id}", // Prefixed with "analytics:"
"index": "products", // Index UID
"operation": "ClickThrough", // CdcOperation enum
"primary_keys": ["doc-123"], // Clicked document ID
"shard_ids": [], // Empty for analytics events
"settings_version": 0, // Not applicable
"timestamp": 1234567890, // UNIX timestamp (ms)
"document": { // Original analytics event
"event_type": "click_through",
"event_id": "evt-abc-123",
"session_id": "session-xyz-789",
"index": "products",
"query": "laptop",
"result_id": "doc-123",
"result_position": 3,
"latency_ms": null,
"timestamp": 1234567890
},
"origin": null, // No origin for user events
"event_id": "evt-abc-123" // Stable deduplication key
}
CDC Operation Types
| Operation | Description | When Emitted |
|---|---|---|
ClickThrough |
User clicked a search result | event_type: "click" |
Latency |
Search latency measurement | event_type: "latency" |
Add |
Document added | Document write |
Update |
Document updated | Document write |
Delete |
Document deleted | Document delete |
CDC Configuration
Analytics events respect the cdc.emit_internal_writes configuration:
- Click events (
click_through): Always emitted (user-initiated) - Latency events (
latency): Subject toemit_internal_writesflag
cdc:
enabled: true
emit_internal_writes: false # Suppresses latency events if false
sinks: []
Configuration
Enable Analytics
search_ui:
enabled: true
analytics:
enabled: true # Enable beacon collection
sink: "cdc" # Publish to CDC internal queue
Disable Analytics
search_ui:
enabled: true
analytics:
enabled: false # Disable beacon collection
sink: "cdc"
Client Integration
JavaScript Example
// After obtaining JWT session token from /ui/search/{index}/session
const token = sessionStorage.getItem('miroir_search_token');
// Send click-through event
fetch(`/_miroir/ui/search/products/beacon`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`
},
body: JSON.stringify({
event_id: crypto.randomUUID(),
event_type: 'click',
index_uid: 'products',
query: 'laptop',
document_id: 'prod-123',
position: 3
})
});
Latency Measurement
// Measure search latency
const searchStart = performance.now();
// Perform search...
const results = await search(query);
const searchEnd = performance.now();
const latencyMs = Math.round(searchEnd - searchStart);
// Send latency beacon
fetch(`/_miroir/ui/search/products/beacon`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`
},
body: JSON.stringify({
event_id: crypto.randomUUID(),
event_type: 'latency',
index_uid: 'products',
query: query,
result_count: results.hits.length,
latency_ms: latencyMs
})
});
Metrics
The following Prometheus metrics are exposed for beacon events:
miroir_search_ui_sessions_total- Total sessions mintedmiroir_search_ui_queries_total{index}- Total search queriesmiroir_search_ui_zero_hits_total{index}- Total zero-result queriesmiroir_search_ui_click_through_total{index}- Total click events (when analytics enabled)miroir_search_ui_p95_ms{index}- P95 end-user latency
Rate Limiting
The beacon endpoint is rate-limited per source IP:
- Default: 60 requests per minute
- Backend: Redis (distributed) or local (per-pod)
- Scope: Per-IP limit, not per-session
Rate limit configuration:
search_ui:
rate_limit:
per_ip: "60/minute"
backend: "redis" # or "local"
redis_key_prefix: "miroir:ratelimit:searchui:"
redis_ttl_s: 60
Security Considerations
- Authentication Required: Beacon endpoint requires valid JWT session token
- No PII in Events: Beacon events should not contain personally identifiable information
- Idempotency: Duplicate events are ignored to prevent replay attacks
- Origin Validation: Session token
idxclaim must match requestindex_uid - Scope Validation: JWT
scopeclaim must includebeaconaction
Troubleshooting
Events Not Appearing in CDC
- Check
search_ui.analytics.enabled: true - Verify CDC is enabled:
cdc.enabled: true - Check
cdc.emit_internal_writesfor latency events - Verify JWT token includes
beaconin scope - Check browser console for network errors
Duplicate Events
If duplicate events appear in CDC:
- Verify
event_idis unique per event - Check Redis idempotency cache is functioning
- Ensure client is not retrying failed requests
High Cardinality
To reduce CDC stream cardinality:
- Avoid including unique user IDs in beacon events
- Use session IDs instead of user IDs
- Aggregate high-frequency events before sending
References
- Plan §13.21: Default search interface (end-user search UI)
- Plan §13.13: CDC (Change Data Capture)
crates/miroir-proxy/src/routes/search_ui.rs- Beacon endpoint implementationcrates/miroir-core/src/cdc.rs- CDC manager and analytics event types