Skip to main content
Event-Driven Architecture

Multi-Tenant Event Flows

Ravinder··6 min read
Event-Driven ArchitectureDistributed SystemsArchitectureMulti-TenancyKafka
Share:
Multi-Tenant Event Flows

Building an event-driven system for a single tenant is hard enough. Adding multi-tenancy multiplies the complexity: you now need to ensure that one tenant's event storm does not degrade another's latency, that events from tenant A never reach tenant B's consumers, and that you can offer different SLAs to different tiers of customers—all while keeping the infrastructure manageable.

The decisions you make here are expensive to reverse. A system designed with topic-per-tenant from day one can migrate to shared topics later. A system that started with a single shared topic and needs to be split by tenant at scale is a major operational undertaking.

The Three Isolation Models

Dedicated infrastructure per tenant: each tenant gets their own Kafka cluster, topics, and consumer groups.

Topic-per-tenant on shared infrastructure: a shared Kafka cluster, but each tenant gets their own topics.

Shared topics with tenant filtering: all tenants share topics; tenant ID is a message attribute and consumers filter.

graph TD subgraph "Model 1: Dedicated Infrastructure" T1[Tenant A Cluster] --> C1[Tenant A Consumers] T2[Tenant B Cluster] --> C2[Tenant B Consumers] end subgraph "Model 2: Topic per Tenant" Broker1[Shared Broker] Broker1 --> TA[orders.placed.tenant-a] Broker1 --> TB[orders.placed.tenant-b] TA --> CA[Consumer Group A] TB --> CB[Consumer Group B] end subgraph "Model 3: Shared Topics" Broker2[Shared Broker] Broker2 --> Shared[orders.placed] Shared --> CF[Consumer with tenant filter] end
Model Isolation Cost Noisy Neighbor Risk Data Separation
Dedicated infrastructure Complete High None Physical
Topic per tenant Logical Medium Partition-level Logical
Shared topics None Low High None

Partitioning Strategy

For model 2 and 3, partitioning strategy determines both ordering guarantees and isolation quality.

Partition key = tenant ID: all events from a tenant land on the same partition. This provides per-tenant ordering but creates noisy neighbor risk—a high-volume tenant dominates a partition.

Partition key = {tenantId}:{aggregateId}: distributes events across partitions while maintaining per-aggregate ordering within a tenant. This is the default recommendation for most multi-tenant systems.

from confluent_kafka import Producer
 
def publish_event(event: dict, tenant_id: str, aggregate_id: str) -> None:
    partition_key = f"{tenant_id}:{aggregate_id}"
    
    producer.produce(
        topic="orders.placed",
        key=partition_key.encode("utf-8"),
        value=json.dumps(event).encode("utf-8"),
        headers=[
            ("tenant-id", tenant_id.encode("utf-8")),
            ("event-type", event["type"].encode("utf-8"))
        ]
    )

Headers carry tenant metadata for consumer-side filtering without embedding it in the business payload.

Noisy Neighbor Mitigation

A noisy neighbor is a tenant whose event volume degrades the throughput or latency experienced by other tenants on the same partition or consumer group.

Rate limiting at the producer: throttle event production per tenant before events hit the broker.

import time
from collections import defaultdict
 
class TenantRateLimiter:
    def __init__(self, events_per_second_per_tenant: int):
        self.limit = events_per_second_per_tenant
        self.counts = defaultdict(int)
        self.window_start = defaultdict(float)
    
    def allow(self, tenant_id: str) -> bool:
        now = time.monotonic()
        if now - self.window_start[tenant_id] >= 1.0:
            self.counts[tenant_id] = 0
            self.window_start[tenant_id] = now
        
        self.counts[tenant_id] += 1
        return self.counts[tenant_id] <= self.limit
 
rate_limiter = TenantRateLimiter(events_per_second_per_tenant=1000)
 
def publish_order_event(event: dict, tenant_id: str) -> None:
    if not rate_limiter.allow(tenant_id):
        raise TenantRateLimitExceeded(f"Tenant {tenant_id} exceeded rate limit")
    publish_event(event, tenant_id, event["data"]["orderId"])

Dedicated partitions for high-volume tenants: assign specific partitions to large tenants and ensure their consumer groups only read those partitions.

from confluent_kafka import TopicPartition
 
def create_dedicated_consumer(tenant_id: str, partitions: list[int]) -> None:
    consumer = Consumer({
        "bootstrap.servers": "kafka:9092",
        "group.id": f"order-processor-{tenant_id}"
    })
    
    # Manually assign specific partitions
    consumer.assign([
        TopicPartition("orders.placed", p) for p in partitions
    ])
    return consumer

Separate topics for premium tiers: high-SLA tenants get dedicated topics with dedicated consumer groups and monitoring. This is operationally expensive but provides complete isolation.

Data Isolation and Compliance

For regulated industries (healthcare, finance), logical isolation is often not enough. Events containing PII or sensitive data must not exist on shared topics.

Envelope encryption per tenant: encrypt the event payload with a tenant-specific key. A consumer that decrypts with the wrong tenant key gets garbage.

from cryptography.fernet import Fernet
 
TENANT_KEYS = {
    "tenant-a": Fernet(load_key("tenant-a")),
    "tenant-b": Fernet(load_key("tenant-b"))
}
 
def encrypt_payload(payload: dict, tenant_id: str) -> bytes:
    fernet = TENANT_KEYS[tenant_id]
    return fernet.encrypt(json.dumps(payload).encode())
 
def decrypt_payload(encrypted: bytes, tenant_id: str) -> dict:
    fernet = TENANT_KEYS[tenant_id]
    return json.loads(fernet.decrypt(encrypted))
 
# Publishing
event_bytes = encrypt_payload(event["data"], tenant_id)
producer.produce(
    topic="orders.placed",
    key=f"{tenant_id}:{aggregate_id}".encode(),
    value=event_bytes,
    headers=[("tenant-id", tenant_id.encode())]
)

Consumer-side tenant validation: consumers should validate that the tenant ID in the event matches the tenant context they are operating in.

def handle_order_placed(message) -> None:
    tenant_id = dict(message.headers()).get("tenant-id", b"").decode()
    
    if tenant_id != EXPECTED_TENANT_ID:
        logger.error(f"Tenant mismatch: expected {EXPECTED_TENANT_ID}, got {tenant_id}")
        raise SecurityError("Tenant isolation violation detected")
    
    payload = decrypt_payload(message.value(), tenant_id)
    process_order(payload)

Consumer Group Isolation

Consumer group naming should include the tenant context to prevent cross-tenant offset interference:

graph TD Topic[orders.placed topic] --> CGa[group: inventory-tenant-a] Topic --> CGb[group: inventory-tenant-b] Topic --> CGshared[group: analytics-all-tenants] CGa --> FilterA[Filter: tenant-a only] CGb --> FilterB[Filter: tenant-b only] CGshared --> AllData[Process all tenants]

Each tenant-specific consumer group maintains independent offsets. A replay for tenant A does not affect tenant B's processing position.

Tenant Onboarding and Topic Provisioning

For topic-per-tenant models, automate topic creation as part of tenant onboarding:

from confluent_kafka.admin import AdminClient, NewTopic
 
def provision_tenant_topics(tenant_id: str, retention_days: int = 30) -> None:
    admin = AdminClient({"bootstrap.servers": "kafka:9092"})
    
    topics = [
        NewTopic(
            f"orders.placed.{tenant_id}",
            num_partitions=6,
            replication_factor=3,
            config={
                "retention.ms": str(retention_days * 86400 * 1000),
                "cleanup.policy": "delete"
            }
        ),
        NewTopic(
            f"orders.shipped.{tenant_id}",
            num_partitions=6,
            replication_factor=3
        )
    ]
    
    result = admin.create_topics(topics)
    for topic, future in result.items():
        future.result()  # Raises on failure
        logger.info(f"Provisioned topic: {topic}")

Key Takeaways

  • Choose isolation model based on compliance requirements and cost tolerance: dedicated infrastructure for regulated tenants, topic-per-tenant for logical isolation, shared topics only for low-sensitivity data with tenant filtering.
  • Partition key {tenantId}:{aggregateId} distributes events across partitions while maintaining per-aggregate ordering within a tenant.
  • Rate limiting at the producer prevents noisy neighbors from reaching the broker; dedicated partitions provide infrastructure-level isolation for high-volume tenants.
  • Envelope encryption per tenant provides data isolation on shared topics without dedicated infrastructure.
  • Consumer group names must include tenant context to prevent cross-tenant offset interference and enable independent replay.
  • Automate topic provisioning as part of tenant onboarding—manual topic creation does not scale beyond a handful of tenants.
Share: