Architecture

Choosing Between Sagas and 2PC in 2026

Ravinder·June 17, 2025·7 min read

ArchitectureDistributed TransactionsSagas2PCMicroservices

Every time someone says "we use sagas because 2PC doesn't scale," I ask them how many transactions per second they actually run. Usually the answer is a few hundred. 2PC handles a few hundred transactions per second without blinking. The real reason they chose sagas is that their databases don't share a coordinator — which is a completely valid reason, just not the one they stated.

The choice between sagas, 2PC, and idempotent eventual consistency is not about scale. It is about failure semantics, operational complexity, and what your business actually requires when things go wrong.

Where 2PC Still Applies

Two-phase commit requires a coordinator and a set of participants that all implement the XA protocol (or equivalent). The coordinator asks participants to prepare, waits for all confirmations, then commits. If any participant fails to prepare, the coordinator aborts all.

sequenceDiagram participant C as Coordinator participant P1 as Participant 1 participant P2 as Participant 2 C->>P1: PREPARE C->>P2: PREPARE P1-->>C: READY P2-->>C: READY C->>P1: COMMIT C->>P2: COMMIT P1-->>C: ACK P2-->>C: ACK

2PC is still the right choice when:

All participants are under your operational control and support XA (Postgres, MySQL, Oracle all do).
You cannot tolerate even temporary inconsistency — financial ledger entries, stock reservations.
Your transaction volume fits within a single coordinator's capacity (typically hundreds to low thousands per second).
The participants share a network trust boundary (same datacenter, not across cloud regions).

Where 2PC breaks down is not scale — it is the coordinator becoming a single point of failure, the blocking nature of the protocol (a participant cannot release locks until the coordinator says commit or abort), and the requirement that all participants implement the same protocol.

Practical 2PC with Postgres

-- Connection 1 (coordinator role)
BEGIN;
-- do local work on conn 1
PREPARE TRANSACTION 'txn-order-payment-001';
 
-- Connection 2 (participant role)
BEGIN;
-- do local work on conn 2
PREPARE TRANSACTION 'txn-order-payment-001';
 
-- If both prepared:
COMMIT PREPARED 'txn-order-payment-001';  -- on both connections
-- If any prepared failed:
ROLLBACK PREPARED 'txn-order-payment-001';  -- on any that prepared

Operationally, you need to monitor pg_prepared_xacts — a crash between PREPARE and COMMIT leaves orphaned prepared transactions that hold locks indefinitely.

Sagas: Orchestration vs Choreography

A saga breaks a distributed transaction into a sequence of local transactions, each with a compensating transaction that undoes its effect if a later step fails.

Orchestration

A central orchestrator owns the saga state and calls each participant service directly.

class OrderSagaOrchestrator:
    def __init__(self, saga_id: str, steps: list):
        self.saga_id = saga_id
        self.steps = steps
        self.completed_steps = []
 
    def run(self, context: dict):
        for step in self.steps:
            try:
                result = step.execute(context)
                self.completed_steps.append(step)
                context.update(result)
            except StepFailure as e:
                self._compensate(context, e)
                raise SagaFailed(self.saga_id, e) from e
 
    def _compensate(self, context: dict, failure: Exception):
        for step in reversed(self.completed_steps):
            try:
                step.compensate(context)
            except CompensationFailure:
                # Log and alert — manual intervention required
                alert_on_call(self.saga_id, step, context)

Orchestration advantages: easier to trace, single place to add logging and monitoring, saga state is explicit.

Orchestration disadvantages: the orchestrator becomes a bottleneck; if it crashes mid-saga, you need recovery logic to resume from the correct step.

Choreography

Each service listens to events and publishes its own events. No central coordinator.

# Inventory service
@event_handler("order.created")
def on_order_created(event: dict):
    order_id = event["order_id"]
    if reserve_inventory(order_id, event["items"]):
        publish("inventory.reserved", {"order_id": order_id})
    else:
        publish("inventory.reservation_failed", {"order_id": order_id})
 
# Payment service
@event_handler("inventory.reserved")
def on_inventory_reserved(event: dict):
    order_id = event["order_id"]
    if charge_payment(order_id):
        publish("payment.charged", {"order_id": order_id})
    else:
        publish("payment.failed", {"order_id": order_id})
 
# Inventory service (compensation listener)
@event_handler("payment.failed")
def on_payment_failed(event: dict):
    release_inventory(event["order_id"])
    publish("inventory.released", {"order_id": event["order_id"]})

Choreography advantages: loose coupling, each service evolves independently, no orchestrator bottleneck.

Choreography disadvantages: saga state is implicit and distributed — debugging a stuck saga requires correlating events across multiple services' logs. This is brutal in production.

Practical advice: use orchestration when the saga has more than 3 steps or involves compensation paths. Use choreography only for simple, linear, 2-step sagas where you can afford the debuggability cost.

The Third Option: Idempotent Eventual Consistency

Both 2PC and sagas assume you need transactional atomicity across services. Often you do not. You need the outcome to be consistent eventually, not the operations to be atomic.

The pattern: design every state transition to be idempotent and produce a complete desired-state event, not a delta.

# Instead of: "deduct 100 from account" (delta — ordering matters)
# Publish: "account balance should be 900" (desired state — idempotent)
 
def publish_account_state(account_id: str, expected_version: int, new_balance: Decimal):
    """
    Consumers apply this only if their current version < expected_version.
    Multiple deliveries of the same event are safe.
    """
    publish("account.state", {
        "account_id":       account_id,
        "balance":          str(new_balance),
        "version":          expected_version,
        "as_of":            utcnow().isoformat(),
        "idempotency_key":  f"{account_id}:v{expected_version}"
    })

Consumers apply the event only if it advances their version:

UPDATE accounts
SET balance = $1, version = $2
WHERE account_id = $3
  AND version < $2;  -- idempotent: no-op if version already applied

This eliminates the coordination entirely. It works when:

The business operation can be expressed as a desired state, not a delta.
You can tolerate a window of inconsistency (even milliseconds to seconds).
All consumers are idempotent.

It does not work for strict financial operations where intermediate states matter (overdraft detection requires real-time balance, not eventual balance).

Decision Matrix

flowchart TD Q1{All participants\nsupport XA\nand same operator?} Q1 -->|Yes| Q2{Transaction\nvolume < 5k/s?} Q1 -->|No| Q3{Need atomic\nrollback on\nfailure?} Q2 -->|Yes| Use2PC["Use 2PC"] Q2 -->|No| Q3 Q3 -->|Yes| Q4{More than\n3 steps or\ncomplex compensation?} Q3 -->|No| UseIdempotent["Use Idempotent\nEventual Consistency"] Q4 -->|Yes| UseOrchestration["Use Saga\nOrchestration"] Q4 -->|No| UseChoreography["Use Saga\nChoreography"] style Use2PC fill:#2b6cb0,color:#e2e8f0 style UseIdempotent fill:#276749,color:#e2e8f0 style UseOrchestration fill:#744210,color:#e2e8f0 style UseChoreography fill:#553c9a,color:#e2e8f0

Compensations Are Not Rollbacks

The most dangerous misunderstanding about sagas: a compensation is not a rollback. A rollback is atomic and leaves no trace. A compensation is a new business operation that may fail, may be partially applied, and leaves its own audit trail.

If your compensation is "refund the charge," the customer may have already spent that money. The charge is real. The refund is also real. They are two separate financial events. Design your compensations as first-class business operations, not cleanup code.

class CompensatePayment:
    def compensate(self, context: dict):
        refund_id = create_refund(
            charge_id=context["charge_id"],
            reason="saga_compensation",
            idempotency_key=f"refund:{context['saga_id']}"
        )
        # Log the refund as a business event, not just a system event
        audit_log.write({
            "type": "REFUND_ISSUED",
            "saga_id": context["saga_id"],
            "refund_id": refund_id,
            "amount": context["charged_amount"]
        })

Key Takeaways

2PC is not dead — it is correct for XA-capable participants under your control at moderate volume; its real limitation is coordinator lock-in and blocking, not throughput.
Saga orchestration is preferable to choreography for anything with more than 3 steps; choreography's distributed state makes production debugging expensive.
Idempotent eventual consistency eliminates coordination entirely and is the right answer when you can express operations as desired-state events rather than deltas.
Compensating transactions are new business operations, not rollbacks — design them as first-class operations with their own idempotency keys and audit trails.
Monitor pg_prepared_xacts when using 2PC — orphaned prepared transactions hold locks until manually resolved.
The choice is not about scale; it is about failure semantics and what your business tolerates when partial completion occurs.