Skip to main content
Kafka in Production

Exactly-Once, the Honest Version

Ravinder··6 min read
KafkaStreamingDistributed SystemsExactly-OnceTransactions
Share:
Exactly-Once, the Honest Version

"We use Kafka's exactly-once semantics" is one of the most confidently incorrect statements in distributed systems engineering. Teams enable enable.idempotence=true or isolation.level=read_committed and assume the problem is solved. Then they hit a scenario the guarantee doesn't cover and spend a week debugging duplicates that shouldn't exist.

Exactly-once semantics (EOS) in Kafka is real, it works, and it's genuinely useful—but it covers a specific, bounded scope. Understanding that scope precisely is the difference between using it correctly and having false confidence.

What the Guarantee Actually Covers

Kafka EOS is a guarantee about Kafka-to-Kafka processing. Specifically:

  • A record written by a transactional producer will appear in the topic exactly once (not zero times, not twice).
  • A consumer reading with isolation.level=read_committed will see only records from committed transactions.
  • A read-process-write pipeline (Kafka Streams is the canonical example) can consume from one topic and produce to another with exactly-once delivery end-to-end—within Kafka.
flowchart LR subgraph EOS Scope direction LR A[Input Topic] -->|read| B[Kafka Streams / Processor] B -->|write| C[Output Topic] end D[External DB] -.->|NOT covered| B E[REST API] -.->|NOT covered| B F[File System] -.->|NOT covered| B style EOS Scope fill:#e8f5e9,stroke:#4caf50

The moment your processing touches anything outside Kafka—a database write, an HTTP call, a file write—EOS provides no guarantee. That external side effect may happen zero, one, or multiple times regardless of what Kafka does.

The Transactional Producer

EOS starts at the producer. The transactional producer groups records into atomic units: either all records in the transaction are committed, or none are.

Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092");
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "order-processor-1");
// transactional.id must be unique and stable per producer instance
 
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.initTransactions();
 
try {
    producer.beginTransaction();
 
    producer.send(new ProducerRecord<>("order-confirmed", orderId, confirmPayload));
    producer.send(new ProducerRecord<>("inventory-reserved", itemId, reservePayload));
 
    producer.commitTransaction();
} catch (ProducerFencedException e) {
    // Another producer with the same transactional.id took over — do not retry
    producer.close();
} catch (KafkaException e) {
    producer.abortTransaction();
    // Retry the entire transaction
}

transactional.id is the stable identity of the producer. If a producer crashes mid-transaction and restarts with the same transactional.id, Kafka fences the old instance and aborts its incomplete transaction. Only one live instance per transactional.id at a time.

Read-Committed Isolation on the Consumer

Without read_committed, consumers see records from aborted transactions. This is the default behavior (read_uncommitted).

props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");

With read_committed, aborted records are filtered, and the consumer's visible offset only advances past committed transaction markers. This adds a small latency: the consumer must wait for transaction markers before advancing.

sequenceDiagram participant P as Transactional Producer participant B as Broker participant C as Consumer (read_committed) P->>B: BeginTransaction P->>B: Record A (offset 0) P->>B: Record B (offset 1) P->>B: AbortTransaction (marker at offset 2) P->>B: BeginTransaction P->>B: Record C (offset 3) P->>B: CommitTransaction (marker at offset 4) C->>B: Fetch B-->>C: Record C (offset 3) only note over C: A and B are filtered — aborted transaction

Where EOS Ends: The External World Problem

Here is the scenario that bites teams:

// Reads from Kafka, writes to PostgreSQL — EOS does NOT apply
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        db.insert(record.value());   // Side effect outside Kafka
    }
    consumer.commitSync();           // Commit after successful DB write
}

This is at-least-once, not exactly-once. If the DB insert succeeds but the Kafka commit fails, the record is processed again. If you want exactly-once delivery to the DB, you need idempotent writes at the DB level—not EOS.

Patterns for making external writes idempotent:

-- Idempotent upsert using Kafka offset as idempotency key
INSERT INTO order_events (kafka_offset, order_id, payload, created_at)
VALUES ($1, $2, $3, now())
ON CONFLICT (kafka_offset) DO NOTHING;
// Or use a dedupe table
db.execute("""
    INSERT INTO processed_offsets (topic, partition, offset_val)
    VALUES (?, ?, ?)
    ON CONFLICT DO NOTHING
""", topic, partition, offset);
 
if (rowsAffected > 0) {
    // First time seeing this offset — process it
    db.insert(payload);
}

Kafka Streams: The Right Level of Abstraction for EOS

If your pipeline is genuinely Kafka-to-Kafka, Kafka Streams provides EOS with minimal configuration:

Properties streamsProps = new Properties();
streamsProps.put(StreamsConfig.APPLICATION_ID_CONFIG, "order-enricher");
streamsProps.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092");
streamsProps.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG,
    StreamsConfig.EXACTLY_ONCE_V2);  // Use v2 for Kafka 2.5+
 
StreamsBuilder builder = new StreamsBuilder();
KStream<String, OrderEvent> orders = builder.stream("raw-orders");
KTable<String, CustomerProfile> customers = builder.table("customer-profiles");
 
orders.join(customers,
    (order, customer) -> enrich(order, customer))
    .to("enriched-orders");
 
KafkaStreams streams = new KafkaStreams(builder.build(), streamsProps);
streams.start();

EXACTLY_ONCE_V2 uses one transaction per task per commit interval rather than one transaction per partition per commit interval (the v1 behavior). It's more efficient and the default choice for Kafka 2.5+.

The Overhead of EOS

EOS is not free. Understand the costs before enabling it blindly:

Cost Impact
Transaction coordinator overhead ~10–20% throughput reduction at producer
read_committed consumer latency Consumer waits for transaction markers before advancing
Additional broker state Transaction log (__transaction_state) uses disk and memory
Zombie fencing Extra latency when producers restart with the same transactional.id

For pipelines where at-least-once delivery with idempotent writes is acceptable, the simpler approach is usually the right one. EOS earns its complexity cost only when the processing itself is expensive to deduplicate—financial aggregations, state mutations that are hard to reverse.

Key Takeaways

  • Kafka EOS covers Kafka-to-Kafka processing only—the moment you write to a database, call an API, or touch any external system, EOS provides no delivery guarantee for that side effect.
  • The transactional producer makes multi-topic writes atomic—either all topics in the transaction receive the records, or none do; this is the foundation of exactly-once pipelines.
  • read_committed consumers skip aborted records—without it, consumers see partial transactions and aborted records; always pair transactional producers with isolation.level=read_committed consumers.
  • Idempotent external writes are the correct solution for Kafka-to-DB pipelines—use Kafka offsets as idempotency keys and upsert semantics to make replays safe; this is simpler and more reliable than trying to extend EOS outside Kafka.
  • Kafka Streams EXACTLY_ONCE_V2 is the right abstraction for stream-processing pipelines—it handles transaction management automatically; rolling your own transactional producer logic is error-prone and rarely necessary.
  • EOS costs 10–20% throughput at the producer—measure the impact in your environment before enabling it for every topic; at-least-once with deduplication is often the better tradeoff for high-throughput pipelines.
Share: