Skip to main content
Event-Driven Architecture

Outbox and Inbox

Ravinder··5 min read
Event-Driven ArchitectureDistributed SystemsArchitectureOutbox PatternReliability
Share:
Outbox and Inbox

Every event-driven system eventually hits the dual-write problem. You save an order to the database and publish an OrderPlaced event to Kafka. What happens when the database write succeeds and the Kafka publish fails? Or the publish succeeds but the database transaction rolls back? You either lose events or produce phantom events—both are silent consistency failures.

The standard advice is "don't do distributed transactions." The transactional outbox pattern is the practical alternative that achieves the same durability guarantee without two-phase commit.

The Dual-Write Problem

The naive approach:

def place_order(order: Order) -> None:
    db.save(order)                           # Step 1: persist
    kafka.publish("orders.placed", order)    # Step 2: publish

Failure scenarios:

  • DB succeeds, Kafka publish fails → event lost, order exists but nothing reacts
  • Kafka publishes, DB transaction rolls back → phantom event, no corresponding order
  • Network partition between steps → unpredictable

Even with retry logic on the Kafka publish, you can't retry a rolled-back transaction. The two operations are not atomic.

The Transactional Outbox

The outbox pattern makes event publication part of the database transaction. Instead of publishing directly to the broker, write a record to an outbox table in the same transaction as your business data.

-- outbox table
CREATE TABLE outbox (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type  VARCHAR(100) NOT NULL,
    aggregate_id    VARCHAR(255) NOT NULL,
    event_type      VARCHAR(255) NOT NULL,
    payload         JSONB        NOT NULL,
    created_at      TIMESTAMPTZ  DEFAULT NOW(),
    published_at    TIMESTAMPTZ,
    status          VARCHAR(20)  DEFAULT 'PENDING'
);
def place_order(order: Order) -> None:
    with db.transaction():
        db.save(order)
        db.insert("outbox", {
            "aggregate_type": "Order",
            "aggregate_id": order.id,
            "event_type": "OrderPlaced",
            "payload": order.to_dict(),
            "status": "PENDING"
        })
    # Transaction commits both atomically.
    # If it rolls back, neither the order nor the outbox record exists.

A separate relay process reads unpublished outbox records and publishes them to the broker:

sequenceDiagram participant App as Order Service participant DB as Database participant Relay as Outbox Relay participant Broker as Kafka App->>DB: BEGIN TRANSACTION App->>DB: INSERT order App->>DB: INSERT outbox record (PENDING) App->>DB: COMMIT Relay->>DB: SELECT * FROM outbox WHERE status='PENDING' DB-->>Relay: [outbox record] Relay->>Broker: publish OrderPlaced Broker-->>Relay: ACK Relay->>DB: UPDATE outbox SET status='PUBLISHED', published_at=NOW()

If the relay crashes after publishing but before marking the record published, it will re-publish on restart. This means the broker receives the event at least once. Consumers must be idempotent (covered in Post 7).

Relay Implementation Strategies

Polling relay: a background job runs on a schedule, queries the outbox table, and publishes pending records. Simple to implement, adds load to the database, has latency proportional to polling interval.

# Polling relay (simplified)
import time
 
def run_relay():
    while True:
        pending = db.query(
            "SELECT * FROM outbox WHERE status='PENDING' ORDER BY created_at LIMIT 100"
        )
        for record in pending:
            try:
                kafka.publish(
                    topic=topic_for(record["event_type"]),
                    key=record["aggregate_id"],
                    value=record["payload"]
                )
                db.execute(
                    "UPDATE outbox SET status='PUBLISHED', published_at=NOW() WHERE id=%s",
                    [record["id"]]
                )
            except Exception as e:
                logger.error(f"Failed to publish {record['id']}: {e}")
        time.sleep(0.5)

CDC relay (Change Data Capture): a CDC tool like Debezium tails the database transaction log and streams outbox record insertions directly to Kafka. No polling, near-real-time, zero additional database load.

graph LR App[Order Service] -->|write| DB[(PostgreSQL)] DB -->|WAL stream| Debezium[Debezium CDC] Debezium -->|OrderPlaced| Kafka[Kafka] Kafka --> Consumers[Consumers]

CDC is operationally more complex (Debezium cluster, connector config, WAL retention settings) but is the right choice for high-throughput systems where polling latency or database load is a concern.

Inbox Pattern: Deduplication Downstream

The outbox guarantees at-least-once delivery. The inbox pattern handles the consumer side: deduplicate events before processing to achieve effectively-once semantics.

The inbox table records processed event IDs:

CREATE TABLE inbox (
    event_id    UUID PRIMARY KEY,
    event_type  VARCHAR(255) NOT NULL,
    processed_at TIMESTAMPTZ DEFAULT NOW()
);

Consumer processing:

def handle_order_placed(event: dict) -> None:
    event_id = event["id"]
    
    with db.transaction():
        # Attempt to claim the event
        inserted = db.execute(
            "INSERT INTO inbox (event_id, event_type) VALUES (%s, %s) ON CONFLICT DO NOTHING",
            [event_id, event["type"]]
        )
        
        if inserted.rowcount == 0:
            # Already processed — skip idempotently
            logger.info(f"Duplicate event {event_id}, skipping")
            return
        
        # Process the event within the same transaction
        reserve_inventory(event["data"]["orderId"], event["data"]["items"])
        # Transaction commits inbox record + business operation atomically

The ON CONFLICT DO NOTHING combined with the transaction ensures exactly-once processing: either the inbox record and the business operation both commit, or neither does.

Inbox Retention and Cleanup

Inbox tables grow unbounded without cleanup. Events have a natural expiry based on your broker's retention policy—if an event cannot be replayed after 7 days, there's no point keeping inbox records older than 7 days.

-- Run daily via pg_cron or a scheduled job
DELETE FROM inbox
WHERE processed_at < NOW() - INTERVAL '7 days';

Match the inbox retention to your broker's message retention. If your Kafka topics retain 30 days of events (for replay), retain inbox records for 30+ days to catch any replay scenarios.

Outbox vs Saga

The outbox ensures that an event gets published exactly once. A saga coordinates a multi-step workflow across services. These are complementary: each saga step uses the outbox pattern to reliably emit the event that triggers the next step.

graph LR A[Order Service] -->|outbox| B[OrderPlaced] B --> C[Inventory Service] C -->|outbox| D[InventoryReserved] D --> E[Payment Service] E -->|outbox| F[PaymentProcessed] F --> G[Fulfillment Service]

Each arrow that crosses a service boundary goes through an outbox. No step can silently drop an event.

Key Takeaways

  • Dual-write (save to DB + publish to broker in separate operations) creates silent consistency failures when either step fails.
  • The transactional outbox writes the event to the same database transaction as the business data, making publication atomic with persistence.
  • Polling relays are simple to implement; CDC relays (Debezium) are faster and impose less database load at higher throughput.
  • The outbox delivers at-least-once; consumers use the inbox pattern to deduplicate and achieve effectively-once processing.
  • Inbox table retention must match or exceed broker message retention to handle replay scenarios correctly.
  • Outbox and saga patterns are complementary—each saga step emits its trigger event via the outbox.
Share: