Kafka in Production

Partition Strategy

Ravinder·May 8, 2025·6 min read

KafkaStreamingDistributed SystemsPartitioningData Modeling

Series

Kafka in Production

Part 2 of 10

← Part 1

Cluster Sizing and Disk Math

Part 3 →

Producer Tuning and Idempotency

The question that comes up in every Kafka design review: "How many partitions should this topic have?" The honest answer is that partition count is a bet you're making about the future throughput of that topic. Get it wrong and you'll pay a painful tax—either under-parallelized consumers or a repartitioning migration that requires double the infrastructure and a careful cutover window.

Partitioning decisions are almost never purely technical. They're a product of what you know about your data distribution, your consumer parallelism requirements, and your tolerance for operational complexity. This post gives you a framework for making that bet deliberately.

What Partitions Actually Control

Before picking a number, be precise about what you're optimizing for.

Partitions control three things simultaneously:

Consumer parallelism — Max concurrent consumers in a group equals the partition count. 10 partitions = max 10 active consumers.
Ordering scope — Kafka guarantees order only within a partition. Your key choice determines which messages share a partition.
Physical data distribution — Partitions spread across brokers. More partitions = finer-grained load distribution.

graph TD P[Producer] -->|key hash mod N| Part0[Partition 0] P -->|key hash mod N| Part1[Partition 1] P -->|key hash mod N| Part2[Partition 2] Part0 --> B0[Broker 0 - Leader] Part1 --> B1[Broker 1 - Leader] Part2 --> B2[Broker 2 - Leader] B0 --> C0[Consumer 0] B1 --> C1[Consumer 1] B2 --> C2[Consumer 2] style Part0 fill:#4a90d9 style Part1 fill:#4a90d9 style Part2 fill:#4a90d9

The tension: you want enough partitions for consumer parallelism and load distribution, but not so many that partition overhead dominates (metadata, leader elections, file handles).

Key Selection: The Most Consequential Decision

Your partition key determines your ordering guarantee and your data distribution. Choose a bad key and you get hot partitions. Choose no key and you get round-robin distribution with no ordering at all.

Good key properties:

High cardinality (many distinct values)
Roughly uniform distribution across values
Meaningful for your ordering requirement

Common patterns by use case:

Use Case	Key Choice	Why
Order events	`order_id`	All events for one order land in one partition
User activity	`user_id`	Per-user ordering, good cardinality
IoT sensor data	`device_id`	Per-device stream
Payment transactions	`account_id`	Not `transaction_id`—you want account-level ordering
Log aggregation	None (null)	No ordering requirement, maximize throughput

// Explicit key in Java producer
ProducerRecord<String, OrderEvent> record = new ProducerRecord<>(
    "order-events",
    order.getOrderId(),   // partition key
    orderEvent            // value
);
producer.send(record);

# Python producer with explicit key
producer.produce(
    topic="order-events",
    key=order_id.encode("utf-8"),
    value=serialize(order_event),
    on_delivery=delivery_callback,
)

The Hot Partition Problem

Even with a good key, you'll hit hot partitions if your key distribution is skewed. The classic case: a multi-tenant SaaS platform keying on tenant_id where one enterprise customer generates 80% of the traffic.

Symptoms of a hot partition:

One broker's BytesInPerSec is 5× the others
Consumer lag accumulating on one partition only
ReplicaFetcherManager.MaxLag elevated on one broker

Mitigation strategies:

// Strategy 1: Salted key for high-volume keys
String effectiveKey = isHighVolumeKey(tenantId)
    ? tenantId + "-" + (System.nanoTime() % SALT_BUCKETS)
    : tenantId;
// Downside: breaks per-tenant ordering

// Strategy 2: Custom partitioner
public class TenantAwarePartitioner implements Partitioner {
    @Override
    public int partition(String topic, Object key, byte[] keyBytes,
                         Object value, byte[] valueBytes, Cluster cluster) {
        String tenantId = (String) key;
        int numPartitions = cluster.partitionCountForTopic(topic);
        if (HOT_TENANTS.contains(tenantId)) {
            // Spread hot tenants across first N partitions
            return Math.abs(tenantId.hashCode() + ThreadLocalRandom.current().nextInt(4))
                   % Math.min(numPartitions, HOT_TENANT_PARTITION_POOL);
        }
        return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
    }
}

Calculating Partition Count

The formula most teams use:

partitions = max(target_throughput / throughput_per_partition, max_consumer_parallelism)

A single partition on a modern broker handles roughly 10–50 MB/s depending on message size and compression. Start conservative—20 MB/s per partition—then benchmark your actual setup.

def min_partitions(
    target_throughput_mbs: float,
    throughput_per_partition_mbs: float = 20.0,
    max_consumers_planned: int = 0,
) -> int:
    from_throughput = math.ceil(target_throughput_mbs / throughput_per_partition_mbs)
    return max(from_throughput, max_consumers_planned)
 
# Example: 100 MB/s topic, planning for 12 consumers
print(min_partitions(100, max_consumers_planned=12))  # → 12 (throughput needs only 5)

Round up to a number that divides evenly by your planned consumer count. 12 partitions and 4 consumers = 3 partitions per consumer. Uneven splits mean some consumers carry more load.

flowchart LR A[Target Throughput MB/s] --> B[÷ 20 MB/s per partition] C[Max Consumer Count] --> D[Take MAX of both] B --> D D --> E[Round to multiple of consumer count] E --> F[Add 20% buffer for future growth] F --> G[Final Partition Count]

The Repartitioning Problem

Here's the thing nobody tells you upfront: you cannot change a topic's partition count without consequences. Adding partitions changes the key-to-partition mapping for all existing keys. Messages for key user-123 that landed in partition 2 will now land in partition 7. Any consumer maintaining per-partition state (like a running aggregate) breaks.

Your options when you genuinely need more partitions:

Option A: New topic, dual-write period

Week 1: Create order-events-v2 with new partition count
Week 1-2: Producers write to both order-events and order-events-v2
Week 2: Consumers migrate to order-events-v2
Week 3: Retire order-events

This is safe but operationally expensive. You need twice the storage during the window.

Option B: Kafka Streams repartition

Use a Kafka Streams topology to read from the old topic and write to a new topic with the desired partitioning:

StreamsBuilder builder = new StreamsBuilder();
builder.stream("order-events", Consumed.with(Serdes.String(), orderSerde))
       .selectKey((oldKey, value) -> value.getNewPartitionKey())
       .to("order-events-v2", Produced.with(Serdes.String(), orderSerde));

Option C: Accept the ordering break

For topics where ordering is not a hard requirement (metrics, logs, analytics events), just add partitions and document the behavior change.

Key Takeaways

Partition count controls consumer parallelism ceiling—max active consumers in a group equals partition count; you cannot parallelize beyond it without adding partitions.
Key selection is more important than partition count—a high-cardinality, uniformly distributed key prevents hot partitions that no amount of partitions can fix.
Repartitioning is painful by design—changing partition count breaks the key-to-partition mapping, so size generously upfront with a 20% growth buffer.
Hot partitions reveal themselves in broker-level metrics—watch BytesInPerSec per broker and consumer lag per partition, not just cluster-wide averages.
Round partition count to a multiple of your consumer count—uneven splits give some consumers 33% more work than others, creating artificial lag imbalances.
Null keys are a valid choice for order-agnostic topics—round-robin distribution maximizes throughput and eliminates hot partitions when per-key ordering is not required.

Series

Kafka in Production

Part 2 of 10

← Part 1

Cluster Sizing and Disk Math

Part 3 →

Producer Tuning and Idempotency