Architecture

CQRS Without the Cargo Cult

Ravinder·June 23, 2025·7 min read

ArchitectureCQRSMicroservicesEvent Sourcing

I have reviewed systems where a team added separate read and write models, projection workers, an event bus, and a dedicated read database — for a service handling 200 writes per day. The operational cost of that infrastructure dwarfs the salary cost of the engineers maintaining it. The codebase is harder to change, not easier.

CQRS solves a specific problem: when your read access patterns and write access patterns have fundamentally different shapes, optimizing for one degrades the other. If your reads and writes have similar shapes and similar volumes, you do not have that problem. A shared model with careful indexing will outperform a CQRS setup in velocity, cost, and debuggability.

What CQRS Actually Is

Command Query Responsibility Segregation means routing write operations (commands) through one model and read operations (queries) through another. That is the whole pattern. The models can be:

Two different database schemas in the same database.
Two separate databases (write = normalized SQL, read = denormalized read replicas or Elasticsearch).
Event sourcing on the write side with projections on the read side.

The pattern does not require event sourcing, eventual consistency, or microservices. Those are orthogonal choices that are often added on top, usually when they should not be.

flowchart LR subgraph "Simple CQRS" direction TB C1["Command:\nCreateOrder"] --> WM["Write Model\n(normalized)"] WM --> DB1[(Write DB)] Q1["Query:\nListOrders"] --> RM["Read Model\n(denormalized)"] DB1 -->|"sync\n(same TX or replica)"| DB2[(Read DB)] DB2 --> RM end subgraph "CRUD — often correct" direction TB C2["Create/Update/Delete"] --> SM["Single Model"] Q2["Read"] --> SM SM --> DB3[(Database)] end

When Monolith CRUD Is Right

Start here. CRUD with a well-indexed relational database handles:

Services with < 10k writes/day and < 100k reads/day — trivially.
Services where reads and writes touch similar sets of columns.
Services where the query complexity is low (list, filter, sort by indexed columns).
Teams with fewer than 5 engineers owning the service.

The test: can you serve your most expensive read query under load with a single SQL query against your write database (with appropriate indexes)? If yes, you do not need CQRS.

-- This is fine on a normalized schema with proper indexes:
SELECT
    o.id, o.status, o.total_amount,
    c.name AS customer_name,
    COUNT(oi.id) AS item_count
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days'
GROUP BY o.id, c.name
ORDER BY o.created_at DESC
LIMIT 50;

With indexes on (status, created_at) and appropriate foreign key indexes, this query runs in single-digit milliseconds up to tens of millions of rows. That covers most applications.

When CQRS Pays Off

The signal that you need read/write split:

Signal 1: Write shape is normalized, read shape is highly denormalized. Your orders table is normalized across 8 tables, but your dashboard needs to render a flat summary card per order with 30 fields. Joining 8 tables on every page load under high read traffic is expensive. A pre-materialized read model solves this.

Signal 2: Read and write throughput scale independently. Writes come from a handful of internal processes; reads come from millions of end users. Scaling the write database to serve read load is wasteful.

Signal 3: Different consistency requirements. Writes need strong consistency (financial records, inventory). Reads can tolerate seconds of staleness (reporting, analytics).

Signal 4: Reads need a different query engine. Full-text search (Elasticsearch), graph traversal (Neo4j), or time-series (InfluxDB) — the write database cannot serve these efficiently regardless of indexing.

Projection Design

A projection is a read model derived from commands or events. Its schema is optimized for a single query or a small family of queries, not for normalization.

from dataclasses import dataclass
from typing import Optional
 
@dataclass
class OrderSummaryProjection:
    """
    Optimized for: GET /orders?status=X&customer_id=Y
    Rebuilt from: OrderCreated, OrderStatusChanged, PaymentCharged events
    """
    order_id:         str
    customer_id:      str
    customer_name:    str     # denormalized from CustomerUpdated events
    status:           str
    total_amount:     int     # in cents
    item_count:       int
    last_updated_at:  str
 
class OrderSummaryProjector:
    def __init__(self, read_store):
        self.store = read_store
 
    def on_order_created(self, event: dict):
        self.store.upsert("order_summary", {
            "order_id":        event["order_id"],
            "customer_id":     event["customer_id"],
            "customer_name":   event["customer_name"],  # included in event
            "status":          "pending",
            "total_amount":    event["total_amount"],
            "item_count":      len(event["items"]),
            "last_updated_at": event["occurred_at"]
        })
 
    def on_order_status_changed(self, event: dict):
        self.store.update("order_summary",
            key=event["order_id"],
            patch={
                "status":          event["new_status"],
                "last_updated_at": event["occurred_at"]
            }
        )

Include denormalized data in the event payload. If you want the order summary to show the customer name, include the customer name in the OrderCreated event. Do not look it up from the customer service at projection time — that creates coupling and projection failures when the customer service is down.

Sync vs Async Projections

Synchronous (same transaction)

def handle_create_order(cmd: CreateOrderCommand, db, read_db):
    with db.transaction():
        order = Order.create(cmd)
        db.save(order)
        # Update read model in same transaction (if same DB)
        read_db.upsert("order_summary", build_summary(order))

Same-transaction projection update: zero eventual consistency, zero lag. Correct for: small teams, same database, simple projections. Trade-off: write throughput is limited by the cost of updating the read model on every write.

Asynchronous (event-driven)

def handle_create_order(cmd: CreateOrderCommand, db, event_bus):
    with db.transaction():
        order = Order.create(cmd)
        db.save(order)
        event_bus.publish("OrderCreated", order.to_event())
    # Projection worker handles OrderCreated asynchronously

Async projection: eventual consistency (typically < 1s lag in healthy systems, minutes if the worker falls behind). Correct for: high write throughput, large projection computation, multiple independent read models from the same events.

Handle async projection failures explicitly. A projection worker that crashes and restarts will replay events from its last committed offset. Projections must be idempotent:

def on_order_created(self, event: dict):
    # Upsert, not insert — safe to apply multiple times
    self.store.upsert(
        "order_summary",
        key=event["order_id"],
        data=build_summary(event),
        idempotency_key=f"order_created:{event['order_id']}"
    )

The Complexity You Are Buying

When you adopt CQRS with async projections, you are buying:

Eventual consistency — clients that write then immediately read may not see their own write. Implement read-your-own-writes with a version token or by querying the write model for the creator's own data.
Projection rebuild burden — every time you change a projection's shape, you must replay all historical events. On large event stores this takes hours. Test your rebuild procedure in production before you need it.
Multiple failure surfaces — the event bus, the projection worker, the read store, and the write store can each fail independently. Add monitoring for all four.
Debugging complexity — a bug in a projection can corrupt the read model silently. Add checksums or audit queries to detect read/write divergence.

# Divergence detector: run hourly
def check_projection_consistency(write_db, read_db, sample_size=1000):
    orders = write_db.query(
        "SELECT id, status, total_amount FROM orders ORDER BY RANDOM() LIMIT %s",
        (sample_size,)
    )
    for order in orders:
        summary = read_db.get("order_summary", order["id"])
        if summary is None:
            alert(f"Order {order['id']} missing from read model")
        elif summary["status"] != order["status"]:
            alert(f"Order {order['id']} status divergence: "
                  f"write={order['status']} read={summary['status']}")

Projection Rebuild Strategy

When you need to rebuild a projection:

class ProjectionRebuilder:
    def rebuild(self, projector, event_store, projection_name: str):
        # 1. Write to a shadow table
        shadow_name = f"{projection_name}_shadow_{timestamp()}"
        self.store.create_table(shadow_name)
 
        # 2. Replay all events into shadow
        for batch in event_store.replay_all(batch_size=1000):
            for event in batch:
                projector.apply_to(shadow_name, event)
 
        # 3. Atomic swap
        self.store.rename_table(projection_name, f"{projection_name}_old")
        self.store.rename_table(shadow_name, projection_name)
        # 4. Drop old table after validation

Shadow-table rebuilds let you rebuild without downtime. The live read model stays available while the shadow catches up.

Key Takeaways

CQRS is justified when read shape is fundamentally different from write shape, or when reads and writes need to scale independently — not by default.
A normalized SQL schema with correct indexes serves most applications up to tens of millions of rows without a separate read model.
Include denormalized context in event payloads so projections do not need to call other services to build the read model.
Synchronous in-transaction projections eliminate eventual consistency at the cost of write throughput; choose based on your actual throughput numbers.
Async projections must be idempotent — projection workers will replay events on restart.
Build a divergence detector from day one; silent projection bugs corrupt your read model without raising any application errors.