miroir/docs/search_ui_analytics_beacon.md
jedarden 7ea7d0ed52 feat(search-ui): add analytics beacon CDC integration tests and docs
Add comprehensive test coverage for the beacon → CDC pipeline:

Test file (p13_21_beacon_cdc_integration.rs):
- Beacon request structure validation (click, latency events)
- CDC manager stores analytics events correctly
- Analytics event serialization includes all fields
- Analytics events map to correct CDC operation types
- Beacon event_id is used for idempotency
- Config validation for analytics settings
- Session response structure validation

Documentation (docs/search_ui_analytics_beacon.md):
- Beacon endpoint specification and request schema
- Event types (click, latency, impression) and required fields
- Idempotency mechanism using event_id
- CDC integration details and event schema
- Configuration examples for enabling/disabling analytics
- Client integration examples (JavaScript)
- Security considerations and rate limiting
- Metrics and troubleshooting guide

This completes the beacon → CDC integration verification for plan §13.21.

Closes: bf-51eg8
2026-05-26 18:23:52 -04:00

266 lines
8.1 KiB
Markdown

# Search UI Analytics Beacon (plan §13.21)
## Overview
The Search UI Analytics Beacon is an HTTP endpoint that collects end-user interaction data from the embedded search UI. Beacon events are published to the CDC (Change Data Capture) system for downstream analytics processing.
## Beacon Endpoint
**URL:** `POST /_miroir/ui/search/{index}/beacon`
**Authentication:** JWT session token (from `/ui/search/{index}/session`)
**Request Body:** JSON `BeaconRequest`
**Response:** `202 Accepted` (event processed) or `403 Forbidden` (invalid session)
## Beacon Request Schema
```json
{
"event_id": "string", // Client-generated unique identifier for idempotency
"event_type": "string", // Event type: "click", "latency", "impression"
"index_uid": "string", // Index UID
"query": "string (optional)", // Query string for search/latency events
"result_count": "number (optional)", // Number of results for search events
"latency_ms": "number (optional)", // Latency in milliseconds
"document_id": "string (optional)", // Primary key of clicked document (for click events)
"position": "number (optional)" // Click position in results (for click events)
}
```
### Event Types
| Event Type | Description | Required Fields | Optional Fields |
|------------|-------------|-----------------|-----------------|
| `click` | User clicked a search result | `document_id`, `position` | `query`, `result_count` |
| `latency` | Search latency measurement | `latency_ms` | `query`, `result_count` |
| `impression` | Search results impression | `result_count` | `query` |
## Idempotency
Beacon events are idempotent based on the `event_id` field:
1. The orchestrator checks if `event_id` was already processed (Redis-backed)
2. Duplicate events are rejected with `202 Accepted` (no-op)
3. Each unique `event_id` is processed exactly once
For clients that cannot generate `event_id`, the orchestrator generates a deterministic hash from:
- Session ID
- Query string
- Document ID (if present)
- Position (if present)
- Minute-bucketed latency (for latency events)
## CDC Integration
### Publishing to CDC
When `search_ui.analytics.enabled: true`, beacon events are published to the CDC internal queue:
1. Beacon event received at `/ui/search/{index}/beacon`
2. Event converted to `AnalyticsEvent` with `event_type` mapped:
- `"click"``"click_through"` in CDC
- `"latency"``"latency"` in CDC
- `"impression"``"impression"` in CDC
3. Event stored in CDC internal queue with monotonically increasing sequence number
4. Event available via `GET /_miroir/changes?since={cursor}&index={uid}`
### CDC Event Schema
Analytics events in the CDC stream have this structure:
```json
{
"mtask_id": "analytics:{event_id}", // Prefixed with "analytics:"
"index": "products", // Index UID
"operation": "ClickThrough", // CdcOperation enum
"primary_keys": ["doc-123"], // Clicked document ID
"shard_ids": [], // Empty for analytics events
"settings_version": 0, // Not applicable
"timestamp": 1234567890, // UNIX timestamp (ms)
"document": { // Original analytics event
"event_type": "click_through",
"event_id": "evt-abc-123",
"session_id": "session-xyz-789",
"index": "products",
"query": "laptop",
"result_id": "doc-123",
"result_position": 3,
"latency_ms": null,
"timestamp": 1234567890
},
"origin": null, // No origin for user events
"event_id": "evt-abc-123" // Stable deduplication key
}
```
### CDC Operation Types
| Operation | Description | When Emitted |
|-----------|-------------|--------------|
| `ClickThrough` | User clicked a search result | `event_type: "click"` |
| `Latency` | Search latency measurement | `event_type: "latency"` |
| `Add` | Document added | Document write |
| `Update` | Document updated | Document write |
| `Delete` | Document deleted | Document delete |
### CDC Configuration
Analytics events respect the `cdc.emit_internal_writes` configuration:
- **Click events** (`click_through`): Always emitted (user-initiated)
- **Latency events** (`latency`): Subject to `emit_internal_writes` flag
```yaml
cdc:
enabled: true
emit_internal_writes: false # Suppresses latency events if false
sinks: []
```
## Configuration
### Enable Analytics
```yaml
search_ui:
enabled: true
analytics:
enabled: true # Enable beacon collection
sink: "cdc" # Publish to CDC internal queue
```
### Disable Analytics
```yaml
search_ui:
enabled: true
analytics:
enabled: false # Disable beacon collection
sink: "cdc"
```
## Client Integration
### JavaScript Example
```javascript
// After obtaining JWT session token from /ui/search/{index}/session
const token = sessionStorage.getItem('miroir_search_token');
// Send click-through event
fetch(`/_miroir/ui/search/products/beacon`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`
},
body: JSON.stringify({
event_id: crypto.randomUUID(),
event_type: 'click',
index_uid: 'products',
query: 'laptop',
document_id: 'prod-123',
position: 3
})
});
```
### Latency Measurement
```javascript
// Measure search latency
const searchStart = performance.now();
// Perform search...
const results = await search(query);
const searchEnd = performance.now();
const latencyMs = Math.round(searchEnd - searchStart);
// Send latency beacon
fetch(`/_miroir/ui/search/products/beacon`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`
},
body: JSON.stringify({
event_id: crypto.randomUUID(),
event_type: 'latency',
index_uid: 'products',
query: query,
result_count: results.hits.length,
latency_ms: latencyMs
})
});
```
## Metrics
The following Prometheus metrics are exposed for beacon events:
- `miroir_search_ui_sessions_total` - Total sessions minted
- `miroir_search_ui_queries_total{index}` - Total search queries
- `miroir_search_ui_zero_hits_total{index}` - Total zero-result queries
- `miroir_search_ui_click_through_total{index}` - Total click events (when analytics enabled)
- `miroir_search_ui_p95_ms{index}` - P95 end-user latency
## Rate Limiting
The beacon endpoint is rate-limited per source IP:
- **Default:** 60 requests per minute
- **Backend:** Redis (distributed) or local (per-pod)
- **Scope:** Per-IP limit, not per-session
Rate limit configuration:
```yaml
search_ui:
rate_limit:
per_ip: "60/minute"
backend: "redis" # or "local"
redis_key_prefix: "miroir:ratelimit:searchui:"
redis_ttl_s: 60
```
## Security Considerations
1. **Authentication Required:** Beacon endpoint requires valid JWT session token
2. **No PII in Events:** Beacon events should not contain personally identifiable information
3. **Idempotency:** Duplicate events are ignored to prevent replay attacks
4. **Origin Validation:** Session token `idx` claim must match request `index_uid`
5. **Scope Validation:** JWT `scope` claim must include `beacon` action
## Troubleshooting
### Events Not Appearing in CDC
1. Check `search_ui.analytics.enabled: true`
2. Verify CDC is enabled: `cdc.enabled: true`
3. Check `cdc.emit_internal_writes` for latency events
4. Verify JWT token includes `beacon` in scope
5. Check browser console for network errors
### Duplicate Events
If duplicate events appear in CDC:
1. Verify `event_id` is unique per event
2. Check Redis idempotency cache is functioning
3. Ensure client is not retrying failed requests
### High Cardinality
To reduce CDC stream cardinality:
1. Avoid including unique user IDs in beacon events
2. Use session IDs instead of user IDs
3. Aggregate high-frequency events before sending
## References
- Plan §13.21: Default search interface (end-user search UI)
- Plan §13.13: CDC (Change Data Capture)
- `crates/miroir-proxy/src/routes/search_ui.rs` - Beacon endpoint implementation
- `crates/miroir-core/src/cdc.rs` - CDC manager and analytics event types