The Cost of Consistency
← Part 9
Indexes, End to End
Consistency is not binary. Engineers talk about "strongly consistent" and "eventually consistent" databases as if those are the only two options, but the consistency spectrum has many gradations—and each gradation trades something real. Moving from eventual consistency to strict serializable consistency can multiply your write latency by 5–50x depending on deployment topology. That is not a reason to avoid strong consistency; it is a reason to be deliberate about where you need it and where you do not.
The Consistency Spectrum
From weakest to strongest, the main consistency levels you encounter in practice:
| Level | Guarantee | Example Systems |
|---|---|---|
| Eventual | All replicas converge eventually; individual reads may be stale | DynamoDB (default), Cassandra ONE, DNS |
| Monotonic read | You never read an older value than one you already read | Session-pinned reads |
| Read-your-writes | You always see your own writes | Many SQL DBs with session affinity |
| Causal | Causally related operations appear in order to all nodes | MongoDB causal sessions, Cosmos DB |
| Sequential | All operations appear in some global order (not necessarily real-time) | Rarely offered directly |
| Linearizable | Operations appear instantaneous at some point in their real-time interval | etcd, Spanner, CockroachDB, ZooKeeper |
| Strict serializable | Linearizability + full transaction serializability | Spanner, CockroachDB with SSI |
Most production systems need different levels for different data. Your product recommendation cache can be eventually consistent; your payment balance cannot.
The Latency Math
Eventual Consistency
A write succeeds as soon as the local replica acknowledges it. Reads return whatever value the nearest replica has.
Write latency = local disk write ≈ 1–5ms
Read latency = local replica read ≈ 1–5ms
Staleness window = replication lag ≈ 10ms to secondsThis is the minimum achievable latency. Any stronger guarantee costs more.
Quorum Consistency
In a system with N replicas, a write requires W acknowledgments and a read requires R acknowledgments, with W + R > N ensuring overlap.
# For N=3, W=2, R=2:
# Write waits for 2 of 3 replicas → 1 network round trip to nearest replica
# Read contacts 2 of 3 replicas → takes the latest version
write_latency = max_rtt_to_quorum_member + local_disk_write
# e.g., 5ms network + 5ms disk = ~10ms
read_latency = max_rtt_to_quorum_member_for_read
# e.g., ~5ms networkDynamo-style quorum reads are consistent under normal operation but allow stale reads during network partitions if W + R = N + 1 is the threshold. Adding more acknowledgments (W=3) eliminates staleness but adds latency.
Linearizable Reads
A linearizable read must contact the leader (in Raft) or perform a quorum read with fencing to ensure it sees the latest committed value. In a multi-region setup:
Client in us-east-1, leader in us-west-2 (70ms RTT):
Linearizable write = 70ms (send to leader) + commit time
Linearizable read = 70ms (contact leader) OR read from local replica with leaseLeader leases (used in CockroachDB, etcd) allow reads from the local leader without contacting quorum, reducing linearizable read latency to local disk speed—but leases expire and must be renewed, adding a background cost.
The Availability Tradeoff
CAP theorem states that a distributed system cannot be simultaneously Consistent, Available, and Partition-tolerant. During a network partition, you must choose:
- CP (Consistency + Partition tolerance): Refuse to serve reads/writes when quorum is unreachable. No stale data; reduced availability. → Raft-based databases.
- AP (Availability + Partition tolerance): Continue serving reads/writes from available replicas. Possible stale reads; always available. → Cassandra, DynamoDB.
PACELC extends CAP for the non-partition case: even when the network is healthy, there is a tradeoff between latency (L) and consistency (C). A system that requires quorum acknowledgment has higher latency than one that acknowledges locally, network partition or not.
Consistency Levels in Practice: Cassandra
Cassandra exposes tunable consistency per operation. This lets you choose the tradeoff at query time:
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement
# Eventual: fastest, may read stale
stmt = SimpleStatement(
"SELECT balance FROM accounts WHERE id = %s",
consistency_level=ConsistencyLevel.ONE
)
# Quorum: consistent across majority of replicas
stmt = SimpleStatement(
"SELECT balance FROM accounts WHERE id = %s",
consistency_level=ConsistencyLevel.QUORUM
)
# All: strongest, slowest, unavailable if any replica is down
stmt = SimpleStatement(
"SELECT balance FROM accounts WHERE id = %s",
consistency_level=ConsistencyLevel.ALL
)The right consistency level depends on the data. Use ONE for user preference reads. Use QUORUM for anything financial. Use LOCAL_QUORUM in multi-region to avoid cross-region latency on every read while still getting consistency within a region.
Where to Apply Each Level
Strict serializable:
Financial balances, inventory counts, seat reservations
→ Must never show stale values; double-spend / oversell is unacceptable
Linearizable:
Leader election, distributed locks, configuration state
→ Single-object latest-value guarantee is sufficient
Read-your-writes / causal:
Social feed, user profile reads after an update
→ User should see their own changes; other users' lag is acceptable
Eventual:
View counts, recommendation scores, analytics aggregates
→ Approximate values are fine; minimize latency and costLatency Comparison Table
Assume a 3-node cluster, single region, 1ms intra-region RTT, 5ms disk write:
| Consistency Level | Write p50 | Read p50 | Notes |
|---|---|---|---|
| Eventual (ONE) | 5ms | 2ms | Local only |
| Quorum (QUORUM) | 6ms | 3ms | +1 RTT for acknowledgment |
| Linearizable | 7ms | 3–7ms | Leader required for reads unless leased |
| Strict serializable | 8–15ms | 3–7ms | SSI conflict detection overhead |
In a 3-region global deployment, replace "1ms" with the cross-region RTT (50–150ms). The ratios hold; the absolute numbers scale by 50–150x.
Key Takeaways
- Consistency is a spectrum from eventual (no guarantees on staleness) to strict serializable (fully ordered, latest-value transactions); each step up costs measurable latency.
- The latency cost of strong consistency in a single region is small (milliseconds); in a multi-region deployment it scales with cross-region RTT, making strict global consistency expensive.
- CAP theorem forces a choice between consistency and availability during network partitions; PACELC extends this: even without partitions, consistency costs latency.
- Use the weakest consistency level that your correctness requirements permit—eventual for analytics and recommendations, linearizable or serializable for financial and inventory operations.
- Cassandra-style tunable consistency lets you set the level per query; prefer
LOCAL_QUORUMoverQUORUMin multi-region deployments to avoid unnecessary cross-region latency. - Leader leases reduce linearizable read latency to near-local speed by allowing the leader to serve reads without a quorum round trip during the lease period, at the cost of lease renewal overhead.
← Part 9
Indexes, End to End