miroir/docs/dump-import/compatibility-matrix.md
jedarden ff5ab041b9 miroir-zc2.5: Fix dump import compatibility matrix enhancement bead refs
The matrix incorrectly referenced miroir-zc2.6/7/8 as dump import
enhancement beads, but zc2.6 is actually arm64 support and zc2.7/8
don't exist. Replaced with a descriptive "Future Enhancements" table
that maintains traceability without false bead dependencies.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 07:16:06 -04:00

7.6 KiB
Raw Permalink Blame History

Dump Import Compatibility Matrix

Overview

Miroir's streaming dump import (mode: streaming) reconstructs indexes by routing documents through the public API (POST /indexes/{uid}/documents) rather than sending the raw dump file to nodes. This approach enables horizontal scalability but cannot reproduce every possible dump variant.

This matrix identifies which dump variants are fully compatible with streaming mode, which require the broadcast fallback, and what workarounds exist.

Streaming Mode Capabilities

Streaming mode can reconstruct:

Component How it's reconstructed Notes
Documents NDJSON parsed and routed via POST /indexes/{uid}/documents Primary key extracted, shard calculated, _miroir_shard injected
Index settings Two-phase settings broadcast (§13.5) via PATCH /indexes/{uid}/settings Verified by hash comparison
Primary key Set via PUT /indexes/{uid}/settings primaryKey Applied before document streaming
API keys Broadcast via POST /keys Actions/indexes recreated from dump metadata

Compatibility Matrix

Fully Compatible (Streaming Works)

Meilisearch Version Dump Variant Streaming Works? Notes
v1.0+ Standard documents NDJSON Yes Core use case
v1.0+ Index settings (ranking rules, synonyms, etc.) Yes Applied via two-phase broadcast
v1.0+ Primary key configuration Yes Set before document ingest
v1.0+ Custom API keys (actions, indexes) Yes Recreated via POST /keys
v1.5+ Filterable/sortable attributes Yes Standard settings
v1.12+ Dictionary settings Yes Standard settings
v1.19+ Proximity precision settings Yes Standard settings
v1.26+ Embedders (vector search) Yes Standard settings
v1.30+ Faceting settings Yes Standard settings
v1.37+ Pagination settings Yes Standard settings

Requires Broadcast Fallback

Meilisearch Version Dump Variant Streaming Works? Broadcast Needed? Workaround
Any Tasks history No Yes Tasks are transient; not critical for reconstruction. Use broadcast if task UID preservation is required.
Any Dumps with existing _miroir_shard field ⚠️ Conflict Yes Conflict: Miroir injects its own _miroir_shard. If the dump already contains this field from a previous Miroir instance, the injected value conflicts.
< v1.0 Pre-v1.0 dump format ⚠️ Maybe Yes Old dump formats may have incompatible NDJSON structure. Use Meilisearch to upgrade dumps first: restore to vanilla Meilisearch, create new dump.
Any Internal LMDB state No Yes Streaming reconstructs at API level; internal LMDB state (e.g., cache warming) is not reproducible. Not functionally significant.
Any Snapshot-based dumps (.ms.snapshot) No Yes Snapshots are binary LMDB copies, not NDJSON. Convert to dump first via Meilisearch: POST /dumps, then import.
Any Enterprise edition features (sharding, replication) No Yes EE-only dump metadata cannot be reconstructed via CE API. Use broadcast or downgrade to CE dump first.
v1.0 - v1.2 Old-style settings format ⚠️ Maybe Yes Early Meilisearch settings may have changed. Test with a small dump first.
Any Large single-document payloads ⚠️ Risk Yes Documents exceeding memory_buffer_bytes may cause OOM. Broadcast has same limitation but fails more gracefully.
Any Corrupted or partial dumps No No Neither mode handles corruption. Repair source via Meilisearch meilisearch --import-dump with validation.

Version-Specific Notes

Meilisearch v1.37.0 (Current Target)

  • Sharding/Replication metadata: EE-only features in dumps cannot be reconstructed via CE API
  • API key format: Stable; fully reconstructible
  • Settings schema: Stable; fully reconstructible

Meilisearch v1.19.0 - v1.36.x

  • No EE sharding metadata in dumps from CE
  • All settings reconstructible via public API

Meilisearch v1.0.0 - v1.18.x

  • Older dump formats: NDJSON structure stable, but settings may have changed
  • Recommendation: Test with small subset first

Meilisearch < v1.0.0

  • Not officially supported for streaming import
  • Workaround: Restore to vanilla Meilisearch, create v1.0+ dump

Field Conflicts

_miroir_shard Field Collision

Problem: Miroir injects _miroir_shard into every document for routing. If the dump already contains this field (from a previous Miroir instance or user data), there's a conflict.

Detection: Streaming import detects existing _miroir_shard field and:

  1. Logs a warning
  2. Falls back to broadcast mode automatically

Workaround: If you control the schema:

  1. Rename the existing field before dump creation
  2. Or use a custom shard_field config (future enhancement)

Future enhancement: Configurable shard metadata field name would allow using a different field instead of _miroir_shard to avoid conflicts.

Decision Tree: Use Streaming or Broadcast?

Is the dump a standard Meilisearch .dump file?
├─ No → Not supported (convert to .dump first)
└─ Yes → Does it contain `_miroir_shard` field?
    ├─ Yes → Use broadcast (or rename field)
    └─ No → Is it from Meilisearch v1.0+?
        ├─ No → Test with small subset first (may work)
        └─ Yes → Does it require EE features?
            ├─ Yes → Use broadcast
            └─ No → Use streaming (recommended)

Configuration

Force broadcast mode for specific imports:

# miroir-ctl dump import --mode broadcast --file products.dump --index products

Or in config:

miroir:
  dump_import:
    mode: streaming          # Default: streaming
    fallback_on_conflict: true  # Auto-fallback to broadcast on _miroir_shard conflict

Metrics and Observability

When streaming import falls back to broadcast, the following metrics are emitted:

  • miroir_dump_import_mode{mode="streaming"|"broadcast"} (gauge)
  • miroir_dump_import_fallback_total{reason="conflict"|"unsupported"|"manual"} (counter)
  • miroir_dump_import_conflict_field_detected_total{field} (counter)

CLI Output Reference

When miroir-ctl dump import uses broadcast fallback, it outputs:

⚠️  Falling back to broadcast mode
Reason: _miroir_shard field conflict detected
Impact: Transient 2× storage overhead during import
See: docs/dump-import/compatibility-matrix.md

Future Enhancements

Enhancement Description Priority
Configurable shard metadata field name Allow operators to customize the field name used for shard metadata (default: _miroir_shard) to avoid conflicts with existing data schemas P3
Pre-import validation and field conflict detection Analyze dump files before import to detect incompatibilities early (e.g., _miroir_shard conflicts, version issues) P2
EE-to-CE dump conversion tool Convert Enterprise Edition dumps to Community Edition-compatible format for streaming import P4