Skip to main content
Kafka in Production

Multi-Region Replication

Ravinder··6 min read
KafkaStreamingDistributed SystemsMirrorMakerMulti-Region
Share:
Multi-Region Replication

Multi-region Kafka is one of those topics where the architecture diagram looks clean and the production reality is anything but. The diagram shows two clusters, arrows in both directions, and the word "active-active" written with misplaced confidence. What the diagram omits: offset translation complexity, replication lag that appears only under load, and the fundamental CAP problem that means your "active-active" system makes a consistency choice whether you designed it to or not.

This post covers the two main replication tools, the honest limitations of each, and what active-active actually costs.

The Core Problem: Offsets Don't Transfer

Kafka's internal replication (leader-follower within a cluster) is offset-preserving. Records land at the same offset on both leader and followers. Cross-cluster replication is different—MirrorMaker and Cluster Linking both produce records as new writes to the destination cluster. Offsets diverge immediately.

flowchart LR subgraph Region A TA[topic: orders\noffset 0..5000] end subgraph Region B TB[topic: us-east.orders\noffset 0..4990] end TA -->|replication lag ~10 records| TB note1["Consumer group offsets\nmust be translated separately"]

This offset divergence is the root of most cross-region complexity. A consumer group committed to offset 4500 in Region A knows nothing about where that corresponds in Region B.

MirrorMaker 2

MirrorMaker 2 (MM2) is the open-source answer, built on Kafka Connect. It replicates topics, consumer group offsets (translated), and ACLs between clusters.

# mm2.properties — core configuration
clusters = us-east, eu-west
us-east.bootstrap.servers = broker1-us.example.com:9092,broker2-us.example.com:9092
eu-west.bootstrap.servers = broker1-eu.example.com:9092,broker2-eu.example.com:9092
 
# Replicate from us-east to eu-west
us-east->eu-west.enabled = true
us-east->eu-west.topics = orders, inventory, user-events
us-east->eu-west.groups = order-processor, inventory-consumer
 
# Topic naming: destination gets prefixed with source cluster alias
# orders in us-east → us-east.orders in eu-west
replication.factor = 3
tasks.max = 4
# Start MirrorMaker 2
./bin/connect-mirror-maker.sh mm2.properties

MM2 creates a __consumer_offsets mirror that translates source offsets to destination offsets. This is what makes failover to a secondary region meaningful—consumers can resume from approximately where they left off rather than from the beginning.

MM2 Operational Realities

Replication lag: MM2 is a consumer of the source cluster. Under source cluster load, it competes with other consumers for fetch bandwidth. Under heavy traffic, lag can reach minutes.

Topic prefix collision: The default us-east.orders naming convention is the right choice. It avoids collision when you run bidirectional replication (both clusters replicating to each other). Never replicate a topic named us-east.orders from Region B back to Region A—MM2 has cycle detection, but it's based on topic name patterns.

Consumer group offset checkpoint lag: MM2 checkpoints consumer group offsets on a schedule (emit.checkpoints.interval.seconds, default 60s). If Region A fails and 60 seconds of checkpoints haven't synced, consumers in Region B will re-process up to 60 seconds of data. Design for at-least-once across the failover.

Confluent Cluster Linking

Cluster Linking is Confluent's proprietary alternative. Unlike MM2 (which is a Kafka Connect application), Cluster Linking runs inside the broker as a first-class primitive. The key difference: it preserves offsets.

# Create a cluster link from source to destination
kafka-cluster-links --bootstrap-server dest-broker:9092 \
  --create \
  --link my-link \
  --config bootstrap.servers=source-broker:9092
 
# Mirror a topic (offsets are preserved)
kafka-mirrors --create \
  --mirror-topic orders \
  --link my-link \
  --bootstrap-server dest-broker:9092

With offset preservation, a consumer group that committed offset 4500 on the source cluster can resume from offset 4500 on the destination cluster. No offset translation, no "approximately where they left off." This eliminates the primary complexity of MM2-based failover—but requires Confluent Cloud or Confluent Platform.

sequenceDiagram participant SA as Source Cluster (us-east) participant CL as Cluster Link participant DA as Dest Cluster (eu-west) participant C as Consumer SA->>CL: Records at offset 4500–5000 CL->>DA: Records at offset 4500–5000 (preserved) note over DA: Same offsets as source C->>DA: Resume from committed offset 4498 DA-->>C: Records from 4498 onwards — no gap

Active-Active: The Hard Parts

Active-active means producers write to the nearest region, and both regions are simultaneously accepting writes to the same logical topic. This is the highest-availability configuration and the most complex.

The fundamental problem: you have two sources of truth. If a producer in Region A writes order-123 and a producer in Region B writes order-123 (an update) before A's write has replicated to B, you have a write conflict that Kafka has no mechanism to resolve. Kafka is not a database with conflict resolution—it's a log. Both writes land, in different orders, on each cluster.

flowchart TD PA[Producer A writes order-123 v1\noffset 100 in us-east] --> UA[us-east cluster] PB[Producer B writes order-123 v2\noffset 200 in eu-west] --> UE[eu-west cluster] UA -->|replication| UE UE -->|replication| UA UE --> CA[Consumer in eu-west sees:\norder-123 v2 then order-123 v1] UA --> CE[Consumer in us-east sees:\norder-123 v1 then order-123 v2] note1["Different ordering\nper region"]

Practical active-active requires one of:

  1. Entity partitioning by region: Each entity (user, order, account) is pinned to one region. Region A owns even-ID entities, Region B owns odd-ID entities. No write conflicts because no entity has two writers.

  2. Conflict-tolerant data models: Append-only events (activity logs, audit trails) can be merged without conflict. State derived from events can be rebuilt.

  3. Last-writer-wins with timestamps: Accept that conflicts will happen, include a timestamp in every record, consumers take the latest. Works for some use cases (profile updates), fails for others (inventory counts).

Active-active with Kafka is a distributed systems problem that Kafka's replication doesn't solve for you. If you find yourself needing active-active without entity partitioning, reconsider whether a database with multi-master replication (CockroachDB, DynamoDB Global Tables) is the right primary store instead.

Failover Runbook Pattern

# Failover from us-east to eu-west
steps:
  - name: Verify replication lag
    command: |
      kafka-consumer-groups --bootstrap-server eu-west-broker:9092 \
        --describe --group mirrormaker-source-connector
    check: lag < 10000 records per partition
 
  - name: Stop producers in us-east
    action: Scale down producer deployments to 0 replicas
 
  - name: Wait for MM2 to drain
    check: lag == 0 across all mirrored topics
 
  - name: Translate consumer group offsets
    command: |
      kafka-consumer-groups --bootstrap-server eu-west-broker:9092 \
        --reset-offsets --group order-processor \
        --topic us-east.orders --to-latest --execute
 
  - name: Start consumers against eu-west
    action: Scale up consumer deployments in eu-west
 
  - name: Start producers against eu-west
    action: Scale up producer deployments in eu-west

Key Takeaways

  • Cross-cluster replication does not preserve offsets in MirrorMaker 2—consumer groups must translate offsets on failover; design for at-least-once reprocessing during the failover window.
  • Cluster Linking preserves offsets but requires Confluent Platform—for teams on open-source Kafka, MM2 is the only option; evaluate whether the simpler failover story justifies the licensing cost.
  • MM2 replication lag grows under source cluster load—monitor the MirrorMaker consumer group lag as a leading indicator of RPO; 60-second checkpoint intervals set a floor on data loss during failover.
  • Active-active requires write partitioning by region or conflict-tolerant data models—Kafka provides no conflict resolution; two writers to the same entity produce different orderings on each cluster.
  • The default MM2 topic prefix (e.g., us-east.orders) is not cosmetic—it prevents replication cycles in bidirectional setups and must be consistent across all cluster configurations.
  • Failover is a runbook, not a switch—document and rehearse the exact sequence of stopping producers, draining replication, translating offsets, and starting consumers; undocumented failovers during incidents are where data loss happens.
Share: