Skip to main content
Database Internals

MVCC and Snapshot Isolation

Ravinder··5 min read
DatabaseInternalsArchitectureMVCCTransactionsPostgreSQL
Share:
MVCC and Snapshot Isolation

If you have ever opened two database sessions, updated a row in one, and observed that the other session still sees the old value even after the commit—that is MVCC in action. Multi-Version Concurrency Control is the mechanism that lets PostgreSQL, MySQL InnoDB, Oracle, and most modern relational databases serve readers and writers simultaneously without locks. But MVCC is not magic. It keeps old row versions alive on disk, and if nothing reclaims them, your tables bloat, your indexes degrade, and your queries slow down in ways that are infamously hard to diagnose.

How MVCC Creates Multiple Versions

Every row in PostgreSQL carries two hidden system columns: xmin (the transaction ID that created this version) and xmax (the transaction ID that deleted or superseded it, or 0 if still live).

-- Enable visibility of system columns
SELECT xmin, xmax, id, name FROM users WHERE id = 1;
--  xmin  | xmax | id | name
-- -------+------+----+-------
--  10042 |    0 |  1 | Alice

When you UPDATE users SET name = 'Alicia' WHERE id = 1, PostgreSQL does not modify the existing row. It:

  1. Marks the old row's xmax with the current transaction ID.
  2. Inserts a new row version with xmin = current transaction ID and xmax = 0.
Heap page after UPDATE (transaction 10055):
[ xmin=10042, xmax=10055, id=1, name="Alice"  ]  ← dead version
[ xmin=10055, xmax=0,     id=1, name="Alicia" ]  ← live version

Both versions coexist on the same heap page (or an overflow page). The old version is not gone—it persists until VACUUM reclaims it.

Snapshot Isolation and Visibility Rules

When a transaction begins (or in READ COMMITTED mode, when each statement begins), PostgreSQL captures a snapshot: the set of transaction IDs that are currently in-progress or not yet committed. A row version is visible to your snapshot if:

  • xmin is committed and was committed before your snapshot.
  • xmax is either 0, aborted, or was committed after your snapshot.
sequenceDiagram participant T1 as Transaction 1\n(xid=100) participant T2 as Transaction 2\n(xid=101) participant DB T1->>DB: BEGIN (snapshot: active=[101]) T2->>DB: BEGIN T2->>DB: UPDATE users SET name='Alicia' WHERE id=1 T2->>DB: COMMIT T1->>DB: SELECT name FROM users WHERE id=1 DB-->>T1: "Alice" (T2's commit invisible — xid 101 was active at snapshot time) T1->>DB: COMMIT

This is snapshot isolation: T1 sees a consistent point-in-time view of the database regardless of concurrent commits. It is stronger than READ COMMITTED (which would re-snapshot per statement and see T2's update) but weaker than SERIALIZABLE (which detects conflicts between concurrent transactions' read/write sets).

The Vacuum Problem

Dead row versions accumulate. Every UPDATE creates a dead version; every DELETE leaves a dead tombstone. Without cleanup, tables grow unboundedly and index entries point to dead heap rows that must be traversed to determine they are invisible.

AUTOVACUUM is PostgreSQL's background daemon that reclaims dead versions. It scans tables, marks dead versions as free space, and removes index entries pointing to them.

-- Check for tables with high dead tuple counts
SELECT relname,
       n_dead_tup,
       n_live_tup,
       round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
       last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

A dead ratio above 10–15% on a hot table is a signal that autovacuum is not keeping up. Common causes:

  • Autovacuum cost thresholds (autovacuum_vacuum_cost_limit) are too conservative for your I/O capacity.
  • A long-running transaction holds a snapshot older than the oldest dead version, preventing vacuum from reclaiming anything.

Transaction ID Wraparound: The Silent Bomb

PostgreSQL transaction IDs are 32-bit integers. After ~2.1 billion transactions, they wrap around. A row whose xmin is "in the future" relative to the wrapped-around counter appears invisible. This is table corruption.

Autovacuum prevents wraparound by periodically running VACUUM FREEZE, which marks row versions with a special frozen transaction ID that is always considered in the past. If you neglect autovacuum and approach the 2.1 billion limit, PostgreSQL will force a full-table freeze at a threshold (~40 million transactions before wraparound) and refuse writes if the database gets critically close.

-- Check how far tables are from wraparound (lower age = safer)
SELECT relname,
       age(relfrozenxid) AS xid_age,
       pg_size_pretty(pg_total_relation_size(oid)) AS size
FROM pg_class
WHERE relkind = 'r'
ORDER BY age(relfrozenxid) DESC
LIMIT 10;
-- If xid_age approaches 1.5 billion, intervene immediately

Common Surprises in Production

Bloat from high-frequency updates: A table receiving millions of updates per day generates millions of dead versions. Even with autovacuum running, heap bloat accumulates if vacuum cannot keep pace. Monitor table and index bloat with pgstattuple.

Long-running transactions block vacuum: A transaction open for hours prevents vacuum from advancing its cleanup horizon. A single idle-in-transaction connection can cause unbounded bloat across the entire database.

-- Find long-running transactions (run this regularly in monitoring)
SELECT pid,
       now() - pg_stat_activity.query_start AS duration,
       query,
       state
FROM pg_stat_activity
WHERE state != 'idle'
  AND now() - pg_stat_activity.query_start > interval '5 minutes'
ORDER BY duration DESC;

Serializable vs Snapshot Isolation: PostgreSQL's REPEATABLE READ provides snapshot isolation, not true serializability. The SERIALIZABLE isolation level adds Serializable Snapshot Isolation (SSI), which detects read/write anti-dependencies between concurrent transactions. The overhead is low in practice (10–15% in typical OLTP), but the behavior change can surprise applications that assumed REPEATABLE READ was sufficient.

Key Takeaways

  • MVCC creates new row versions on every UPDATE rather than modifying in place; old versions accumulate on heap pages until reclaimed.
  • Snapshot isolation gives each transaction a consistent point-in-time view by consulting xmin/xmax visibility rules against the transaction's snapshot.
  • AUTOVACUUM is not optional—neglecting it leads to table bloat, index degradation, and eventually transaction ID wraparound, which causes data invisibility.
  • A single long-running transaction can prevent vacuum from advancing, causing bloat to accumulate across the entire database.
  • VACUUM FREEZE is the mechanism that prevents 32-bit transaction ID wraparound; monitor age(relfrozenxid) on large tables.
  • PostgreSQL's REPEATABLE READ is snapshot isolation, not serializable; use the SERIALIZABLE level when you need true serializability with SSI conflict detection.
Share: