Skip to main content
← All Series

Series · 10 parts · ~64 min total

Kafka in Production

How to calculate broker count, disk requirements, and replication overhead before you write a single line of producer code.

  1. 1

    Cluster Sizing and Disk Math

    How to calculate broker count, disk requirements, and replication overhead before you write a single line of producer code.

    6 min

    May 1, 2025

  2. 2

    Partition Strategy

    How to choose partition keys, calculate partition counts, and handle the painful reality of repartitioning a live topic.

    6 min

    May 8, 2025

  3. 3

    Producer Tuning and Idempotency

    How to configure acks, batching, compression, and idempotent producers to get both throughput and delivery guarantees without guessing at defaults.

    6 min

    May 15, 2025

  4. 4

    Consumer Group Patterns

    How rebalancing works, when it destroys throughput, and the offset commit strategies that determine whether you replay or lose messages.

    6 min

    May 22, 2025

  5. 5

    Schema Evolution

    How to evolve Kafka message schemas safely using compatibility modes, Schema Registry, and the discipline to never break a downstream consumer.

    6 min

    May 29, 2025

  6. 6

    Exactly-Once, the Honest Version

    What Kafka's exactly-once semantics actually guarantee, where they end, and why your application logic still needs to handle duplicates.

    6 min

    Jun 5, 2025

  7. 7

    Multi-Region Replication

    How MirrorMaker 2 and Cluster Linking work, where active-active falls apart, and the operational discipline required to run Kafka across regions.

    6 min

    Jun 12, 2025

  8. 8

    Upgrades Without Downtime

    How to roll Kafka broker upgrades safely, manage version skew between brokers and clients, and navigate the KRaft migration without a maintenance window.

    6 min

    Jun 19, 2025

  9. 9

    Cost Optimization

    How tiered storage, compression, retention tuning, and cross-AZ traffic patterns determine your Kafka bill—and how to cut it without sacrificing reliability.

    6 min

    Jun 26, 2025

  10. 10

    Incident Library

    Ten real Kafka production incidents—the signs you missed, why they happened, and the exact response that resolved each one.

    10 min

    Jul 3, 2025