Add comprehensive test coverage for the beacon → CDC pipeline: Test file (p13_21_beacon_cdc_integration.rs): - Beacon request structure validation (click, latency events) - CDC manager stores analytics events correctly - Analytics event serialization includes all fields - Analytics events map to correct CDC operation types - Beacon event_id is used for idempotency - Config validation for analytics settings - Session response structure validation Documentation (docs/search_ui_analytics_beacon.md): - Beacon endpoint specification and request schema - Event types (click, latency, impression) and required fields - Idempotency mechanism using event_id - CDC integration details and event schema - Configuration examples for enabling/disabling analytics - Client integration examples (JavaScript) - Security considerations and rate limiting - Metrics and troubleshooting guide This completes the beacon → CDC integration verification for plan §13.21. Closes: bf-51eg8
266 lines
8.1 KiB
Markdown
266 lines
8.1 KiB
Markdown
# Search UI Analytics Beacon (plan §13.21)
|
|
|
|
## Overview
|
|
|
|
The Search UI Analytics Beacon is an HTTP endpoint that collects end-user interaction data from the embedded search UI. Beacon events are published to the CDC (Change Data Capture) system for downstream analytics processing.
|
|
|
|
## Beacon Endpoint
|
|
|
|
**URL:** `POST /_miroir/ui/search/{index}/beacon`
|
|
|
|
**Authentication:** JWT session token (from `/ui/search/{index}/session`)
|
|
|
|
**Request Body:** JSON `BeaconRequest`
|
|
|
|
**Response:** `202 Accepted` (event processed) or `403 Forbidden` (invalid session)
|
|
|
|
## Beacon Request Schema
|
|
|
|
```json
|
|
{
|
|
"event_id": "string", // Client-generated unique identifier for idempotency
|
|
"event_type": "string", // Event type: "click", "latency", "impression"
|
|
"index_uid": "string", // Index UID
|
|
"query": "string (optional)", // Query string for search/latency events
|
|
"result_count": "number (optional)", // Number of results for search events
|
|
"latency_ms": "number (optional)", // Latency in milliseconds
|
|
"document_id": "string (optional)", // Primary key of clicked document (for click events)
|
|
"position": "number (optional)" // Click position in results (for click events)
|
|
}
|
|
```
|
|
|
|
### Event Types
|
|
|
|
| Event Type | Description | Required Fields | Optional Fields |
|
|
|------------|-------------|-----------------|-----------------|
|
|
| `click` | User clicked a search result | `document_id`, `position` | `query`, `result_count` |
|
|
| `latency` | Search latency measurement | `latency_ms` | `query`, `result_count` |
|
|
| `impression` | Search results impression | `result_count` | `query` |
|
|
|
|
## Idempotency
|
|
|
|
Beacon events are idempotent based on the `event_id` field:
|
|
|
|
1. The orchestrator checks if `event_id` was already processed (Redis-backed)
|
|
2. Duplicate events are rejected with `202 Accepted` (no-op)
|
|
3. Each unique `event_id` is processed exactly once
|
|
|
|
For clients that cannot generate `event_id`, the orchestrator generates a deterministic hash from:
|
|
- Session ID
|
|
- Query string
|
|
- Document ID (if present)
|
|
- Position (if present)
|
|
- Minute-bucketed latency (for latency events)
|
|
|
|
## CDC Integration
|
|
|
|
### Publishing to CDC
|
|
|
|
When `search_ui.analytics.enabled: true`, beacon events are published to the CDC internal queue:
|
|
|
|
1. Beacon event received at `/ui/search/{index}/beacon`
|
|
2. Event converted to `AnalyticsEvent` with `event_type` mapped:
|
|
- `"click"` → `"click_through"` in CDC
|
|
- `"latency"` → `"latency"` in CDC
|
|
- `"impression"` → `"impression"` in CDC
|
|
3. Event stored in CDC internal queue with monotonically increasing sequence number
|
|
4. Event available via `GET /_miroir/changes?since={cursor}&index={uid}`
|
|
|
|
### CDC Event Schema
|
|
|
|
Analytics events in the CDC stream have this structure:
|
|
|
|
```json
|
|
{
|
|
"mtask_id": "analytics:{event_id}", // Prefixed with "analytics:"
|
|
"index": "products", // Index UID
|
|
"operation": "ClickThrough", // CdcOperation enum
|
|
"primary_keys": ["doc-123"], // Clicked document ID
|
|
"shard_ids": [], // Empty for analytics events
|
|
"settings_version": 0, // Not applicable
|
|
"timestamp": 1234567890, // UNIX timestamp (ms)
|
|
"document": { // Original analytics event
|
|
"event_type": "click_through",
|
|
"event_id": "evt-abc-123",
|
|
"session_id": "session-xyz-789",
|
|
"index": "products",
|
|
"query": "laptop",
|
|
"result_id": "doc-123",
|
|
"result_position": 3,
|
|
"latency_ms": null,
|
|
"timestamp": 1234567890
|
|
},
|
|
"origin": null, // No origin for user events
|
|
"event_id": "evt-abc-123" // Stable deduplication key
|
|
}
|
|
```
|
|
|
|
### CDC Operation Types
|
|
|
|
| Operation | Description | When Emitted |
|
|
|-----------|-------------|--------------|
|
|
| `ClickThrough` | User clicked a search result | `event_type: "click"` |
|
|
| `Latency` | Search latency measurement | `event_type: "latency"` |
|
|
| `Add` | Document added | Document write |
|
|
| `Update` | Document updated | Document write |
|
|
| `Delete` | Document deleted | Document delete |
|
|
|
|
### CDC Configuration
|
|
|
|
Analytics events respect the `cdc.emit_internal_writes` configuration:
|
|
|
|
- **Click events** (`click_through`): Always emitted (user-initiated)
|
|
- **Latency events** (`latency`): Subject to `emit_internal_writes` flag
|
|
|
|
```yaml
|
|
cdc:
|
|
enabled: true
|
|
emit_internal_writes: false # Suppresses latency events if false
|
|
sinks: []
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Enable Analytics
|
|
|
|
```yaml
|
|
search_ui:
|
|
enabled: true
|
|
analytics:
|
|
enabled: true # Enable beacon collection
|
|
sink: "cdc" # Publish to CDC internal queue
|
|
```
|
|
|
|
### Disable Analytics
|
|
|
|
```yaml
|
|
search_ui:
|
|
enabled: true
|
|
analytics:
|
|
enabled: false # Disable beacon collection
|
|
sink: "cdc"
|
|
```
|
|
|
|
## Client Integration
|
|
|
|
### JavaScript Example
|
|
|
|
```javascript
|
|
// After obtaining JWT session token from /ui/search/{index}/session
|
|
const token = sessionStorage.getItem('miroir_search_token');
|
|
|
|
// Send click-through event
|
|
fetch(`/_miroir/ui/search/products/beacon`, {
|
|
method: 'POST',
|
|
headers: {
|
|
'Content-Type': 'application/json',
|
|
'Authorization': `Bearer ${token}`
|
|
},
|
|
body: JSON.stringify({
|
|
event_id: crypto.randomUUID(),
|
|
event_type: 'click',
|
|
index_uid: 'products',
|
|
query: 'laptop',
|
|
document_id: 'prod-123',
|
|
position: 3
|
|
})
|
|
});
|
|
```
|
|
|
|
### Latency Measurement
|
|
|
|
```javascript
|
|
// Measure search latency
|
|
const searchStart = performance.now();
|
|
|
|
// Perform search...
|
|
const results = await search(query);
|
|
|
|
const searchEnd = performance.now();
|
|
const latencyMs = Math.round(searchEnd - searchStart);
|
|
|
|
// Send latency beacon
|
|
fetch(`/_miroir/ui/search/products/beacon`, {
|
|
method: 'POST',
|
|
headers: {
|
|
'Content-Type': 'application/json',
|
|
'Authorization': `Bearer ${token}`
|
|
},
|
|
body: JSON.stringify({
|
|
event_id: crypto.randomUUID(),
|
|
event_type: 'latency',
|
|
index_uid: 'products',
|
|
query: query,
|
|
result_count: results.hits.length,
|
|
latency_ms: latencyMs
|
|
})
|
|
});
|
|
```
|
|
|
|
## Metrics
|
|
|
|
The following Prometheus metrics are exposed for beacon events:
|
|
|
|
- `miroir_search_ui_sessions_total` - Total sessions minted
|
|
- `miroir_search_ui_queries_total{index}` - Total search queries
|
|
- `miroir_search_ui_zero_hits_total{index}` - Total zero-result queries
|
|
- `miroir_search_ui_click_through_total{index}` - Total click events (when analytics enabled)
|
|
- `miroir_search_ui_p95_ms{index}` - P95 end-user latency
|
|
|
|
## Rate Limiting
|
|
|
|
The beacon endpoint is rate-limited per source IP:
|
|
|
|
- **Default:** 60 requests per minute
|
|
- **Backend:** Redis (distributed) or local (per-pod)
|
|
- **Scope:** Per-IP limit, not per-session
|
|
|
|
Rate limit configuration:
|
|
|
|
```yaml
|
|
search_ui:
|
|
rate_limit:
|
|
per_ip: "60/minute"
|
|
backend: "redis" # or "local"
|
|
redis_key_prefix: "miroir:ratelimit:searchui:"
|
|
redis_ttl_s: 60
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
1. **Authentication Required:** Beacon endpoint requires valid JWT session token
|
|
2. **No PII in Events:** Beacon events should not contain personally identifiable information
|
|
3. **Idempotency:** Duplicate events are ignored to prevent replay attacks
|
|
4. **Origin Validation:** Session token `idx` claim must match request `index_uid`
|
|
5. **Scope Validation:** JWT `scope` claim must include `beacon` action
|
|
|
|
## Troubleshooting
|
|
|
|
### Events Not Appearing in CDC
|
|
|
|
1. Check `search_ui.analytics.enabled: true`
|
|
2. Verify CDC is enabled: `cdc.enabled: true`
|
|
3. Check `cdc.emit_internal_writes` for latency events
|
|
4. Verify JWT token includes `beacon` in scope
|
|
5. Check browser console for network errors
|
|
|
|
### Duplicate Events
|
|
|
|
If duplicate events appear in CDC:
|
|
1. Verify `event_id` is unique per event
|
|
2. Check Redis idempotency cache is functioning
|
|
3. Ensure client is not retrying failed requests
|
|
|
|
### High Cardinality
|
|
|
|
To reduce CDC stream cardinality:
|
|
1. Avoid including unique user IDs in beacon events
|
|
2. Use session IDs instead of user IDs
|
|
3. Aggregate high-frequency events before sending
|
|
|
|
## References
|
|
|
|
- Plan §13.21: Default search interface (end-user search UI)
|
|
- Plan §13.13: CDC (Change Data Capture)
|
|
- `crates/miroir-proxy/src/routes/search_ui.rs` - Beacon endpoint implementation
|
|
- `crates/miroir-core/src/cdc.rs` - CDC manager and analytics event types
|