Multi-Tenant Event Flows
Building an event-driven system for a single tenant is hard enough. Adding multi-tenancy multiplies the complexity: you now need to ensure that one tenant's event storm does not degrade another's latency, that events from tenant A never reach tenant B's consumers, and that you can offer different SLAs to different tiers of customers—all while keeping the infrastructure manageable.
The decisions you make here are expensive to reverse. A system designed with topic-per-tenant from day one can migrate to shared topics later. A system that started with a single shared topic and needs to be split by tenant at scale is a major operational undertaking.
The Three Isolation Models
Dedicated infrastructure per tenant: each tenant gets their own Kafka cluster, topics, and consumer groups.
Topic-per-tenant on shared infrastructure: a shared Kafka cluster, but each tenant gets their own topics.
Shared topics with tenant filtering: all tenants share topics; tenant ID is a message attribute and consumers filter.
| Model | Isolation | Cost | Noisy Neighbor Risk | Data Separation |
|---|---|---|---|---|
| Dedicated infrastructure | Complete | High | None | Physical |
| Topic per tenant | Logical | Medium | Partition-level | Logical |
| Shared topics | None | Low | High | None |
Partitioning Strategy
For model 2 and 3, partitioning strategy determines both ordering guarantees and isolation quality.
Partition key = tenant ID: all events from a tenant land on the same partition. This provides per-tenant ordering but creates noisy neighbor risk—a high-volume tenant dominates a partition.
Partition key = {tenantId}:{aggregateId}: distributes events across partitions while maintaining per-aggregate ordering within a tenant. This is the default recommendation for most multi-tenant systems.
from confluent_kafka import Producer
def publish_event(event: dict, tenant_id: str, aggregate_id: str) -> None:
partition_key = f"{tenant_id}:{aggregate_id}"
producer.produce(
topic="orders.placed",
key=partition_key.encode("utf-8"),
value=json.dumps(event).encode("utf-8"),
headers=[
("tenant-id", tenant_id.encode("utf-8")),
("event-type", event["type"].encode("utf-8"))
]
)Headers carry tenant metadata for consumer-side filtering without embedding it in the business payload.
Noisy Neighbor Mitigation
A noisy neighbor is a tenant whose event volume degrades the throughput or latency experienced by other tenants on the same partition or consumer group.
Rate limiting at the producer: throttle event production per tenant before events hit the broker.
import time
from collections import defaultdict
class TenantRateLimiter:
def __init__(self, events_per_second_per_tenant: int):
self.limit = events_per_second_per_tenant
self.counts = defaultdict(int)
self.window_start = defaultdict(float)
def allow(self, tenant_id: str) -> bool:
now = time.monotonic()
if now - self.window_start[tenant_id] >= 1.0:
self.counts[tenant_id] = 0
self.window_start[tenant_id] = now
self.counts[tenant_id] += 1
return self.counts[tenant_id] <= self.limit
rate_limiter = TenantRateLimiter(events_per_second_per_tenant=1000)
def publish_order_event(event: dict, tenant_id: str) -> None:
if not rate_limiter.allow(tenant_id):
raise TenantRateLimitExceeded(f"Tenant {tenant_id} exceeded rate limit")
publish_event(event, tenant_id, event["data"]["orderId"])Dedicated partitions for high-volume tenants: assign specific partitions to large tenants and ensure their consumer groups only read those partitions.
from confluent_kafka import TopicPartition
def create_dedicated_consumer(tenant_id: str, partitions: list[int]) -> None:
consumer = Consumer({
"bootstrap.servers": "kafka:9092",
"group.id": f"order-processor-{tenant_id}"
})
# Manually assign specific partitions
consumer.assign([
TopicPartition("orders.placed", p) for p in partitions
])
return consumerSeparate topics for premium tiers: high-SLA tenants get dedicated topics with dedicated consumer groups and monitoring. This is operationally expensive but provides complete isolation.
Data Isolation and Compliance
For regulated industries (healthcare, finance), logical isolation is often not enough. Events containing PII or sensitive data must not exist on shared topics.
Envelope encryption per tenant: encrypt the event payload with a tenant-specific key. A consumer that decrypts with the wrong tenant key gets garbage.
from cryptography.fernet import Fernet
TENANT_KEYS = {
"tenant-a": Fernet(load_key("tenant-a")),
"tenant-b": Fernet(load_key("tenant-b"))
}
def encrypt_payload(payload: dict, tenant_id: str) -> bytes:
fernet = TENANT_KEYS[tenant_id]
return fernet.encrypt(json.dumps(payload).encode())
def decrypt_payload(encrypted: bytes, tenant_id: str) -> dict:
fernet = TENANT_KEYS[tenant_id]
return json.loads(fernet.decrypt(encrypted))
# Publishing
event_bytes = encrypt_payload(event["data"], tenant_id)
producer.produce(
topic="orders.placed",
key=f"{tenant_id}:{aggregate_id}".encode(),
value=event_bytes,
headers=[("tenant-id", tenant_id.encode())]
)Consumer-side tenant validation: consumers should validate that the tenant ID in the event matches the tenant context they are operating in.
def handle_order_placed(message) -> None:
tenant_id = dict(message.headers()).get("tenant-id", b"").decode()
if tenant_id != EXPECTED_TENANT_ID:
logger.error(f"Tenant mismatch: expected {EXPECTED_TENANT_ID}, got {tenant_id}")
raise SecurityError("Tenant isolation violation detected")
payload = decrypt_payload(message.value(), tenant_id)
process_order(payload)Consumer Group Isolation
Consumer group naming should include the tenant context to prevent cross-tenant offset interference:
Each tenant-specific consumer group maintains independent offsets. A replay for tenant A does not affect tenant B's processing position.
Tenant Onboarding and Topic Provisioning
For topic-per-tenant models, automate topic creation as part of tenant onboarding:
from confluent_kafka.admin import AdminClient, NewTopic
def provision_tenant_topics(tenant_id: str, retention_days: int = 30) -> None:
admin = AdminClient({"bootstrap.servers": "kafka:9092"})
topics = [
NewTopic(
f"orders.placed.{tenant_id}",
num_partitions=6,
replication_factor=3,
config={
"retention.ms": str(retention_days * 86400 * 1000),
"cleanup.policy": "delete"
}
),
NewTopic(
f"orders.shipped.{tenant_id}",
num_partitions=6,
replication_factor=3
)
]
result = admin.create_topics(topics)
for topic, future in result.items():
future.result() # Raises on failure
logger.info(f"Provisioned topic: {topic}")Key Takeaways
- Choose isolation model based on compliance requirements and cost tolerance: dedicated infrastructure for regulated tenants, topic-per-tenant for logical isolation, shared topics only for low-sensitivity data with tenant filtering.
- Partition key
{tenantId}:{aggregateId}distributes events across partitions while maintaining per-aggregate ordering within a tenant. - Rate limiting at the producer prevents noisy neighbors from reaching the broker; dedicated partitions provide infrastructure-level isolation for high-volume tenants.
- Envelope encryption per tenant provides data isolation on shared topics without dedicated infrastructure.
- Consumer group names must include tenant context to prevent cross-tenant offset interference and enable independent replay.
- Automate topic provisioning as part of tenant onboarding—manual topic creation does not scale beyond a handful of tenants.