Postgres for Backend Engineers

Replication Topologies

Ravinder·June 29, 2025·6 min read

PostgresDatabaseSQLReplicationHigh Availability

Series

Postgres for Backend Engineers

Part 5 of 10

← Part 4

Partitioning for Real Workloads

Part 6 →

Logical Decoding and CDC

Replication in Postgres means different things depending on what problem you're solving. High availability and read scaling use physical replication—a byte-for-byte copy of the primary's data. Cross-version migration, selective table sync, and change capture pipelines use logical replication—a decoded stream of row-level changes. Choosing the wrong topology produces either operational complexity you didn't need or capability you can't have.

Physical Replication

Physical (streaming) replication ships WAL segments from the primary to standbys in near-real-time. The standby applies WAL records and stays in a continuous recovery state. The result is a byte-identical copy of the primary's entire cluster—same tablespaces, same databases, same system catalog.

Key properties:

Standby is read-only (queries are allowed but no writes)
Replicates the entire cluster; no per-table granularity
Failover promotes a standby to primary and is fast (seconds with a proper HA agent)
Zero schema version differences possible—standby must be same major version as primary

Configuration

-- On primary: postgresql.conf
-- wal_level = replica (minimum for physical replication)
-- max_wal_senders = 10
-- wal_keep_size = '1GB'
 
-- Create a replication user
CREATE ROLE replicator REPLICATION LOGIN PASSWORD 'secret';
 
-- pg_hba.conf on primary:
-- host  replication  replicator  10.0.0.0/8  scram-sha-256
 
-- On standby: recovery.conf (Postgres 11) or postgresql.conf (Postgres 12+)
-- primary_conninfo = 'host=10.0.0.1 user=replicator password=secret'
-- restore_command = 'cp /wal_archive/%f %p'

Synchronous vs Asynchronous

-- Synchronous: primary waits for standby to confirm WAL receipt before commit
-- Zero data loss on failover, adds latency to every write
ALTER SYSTEM SET synchronous_standby_names = 'FIRST 1 (standby1)';
 
-- Quorum commit: at least 2 of 3 standbys must confirm
ALTER SYSTEM SET synchronous_standby_names = 'ANY 2 (s1, s2, s3)';
 
-- Async (default): primary does not wait, potential data loss on failover
-- synchronous_standby_names = '' (empty)

Use synchronous replication when RPO = 0 is a hard requirement. Accept the write latency cost explicitly.

Logical Replication

Logical replication decodes WAL into a stream of SQL-level row changes and ships that stream to a subscriber. The subscriber applies the changes to its own tables. Because changes are decoded at the row level, you can:

Replicate a subset of tables
Replicate to a different Postgres major version
Replicate to a different database on the same cluster
Apply transforms or filters before inserting into the target

-- On publisher (source)
-- wal_level = logical  (must be set; requires restart)
 
CREATE PUBLICATION my_pub
  FOR TABLE orders, order_items, products;
 
-- On subscriber (target) - can be different major version
CREATE SUBSCRIPTION my_sub
  CONNECTION 'host=source-db user=replicator dbname=app'
  PUBLICATION my_pub;
 
-- Check replication lag
SELECT
  slot_name,
  confirmed_flush_lsn,
  pg_current_wal_lsn() - confirmed_flush_lsn AS lag_bytes
FROM pg_replication_slots;

Logical Replication Limitations

Logical replication does not replicate DDL. Schema changes must be applied manually to the subscriber before the corresponding data changes arrive—otherwise the subscriber will error and replication will stall.

-- Safe DDL workflow with logical replication:
-- 1. Apply DDL to subscriber
-- 2. Apply DDL to publisher
-- 3. Replication continues without stalling
 
-- What does NOT replicate:
-- - CREATE/ALTER/DROP TABLE
-- - Sequences (current value)
-- - Large objects
-- - Truncate (supported in Postgres 11+ with publish = 'truncate')

Cascading Standbys

A cascading standby receives WAL from another standby rather than the primary, reducing the primary's replication fanout.

flowchart LR PRI[(Primary)] -->|WAL| S1[(Standby 1\nregion: us-east)] S1 -->|WAL| S2[(Standby 2\nregion: eu-west\ncascading)] S1 -->|WAL| S3[(Standby 3\nregion: ap-southeast\ncascading)] style PRI fill:#fef3c7,stroke:#d97706

Cascading is useful for geo-distributed deployments where the primary is in one region and you want standbys in multiple other regions without multiplying the primary's network I/O. The trade-off is additional replication lag—the cascading standby lags relative to the intermediate standby, not just the primary.

Replication Slots

A replication slot tracks the consumer's progress and ensures the primary retains WAL until the consumer has confirmed receipt. Without slots, WAL can be recycled before the standby consumes it, causing the standby to fall too far behind.

-- Physical slot (streaming replication)
SELECT pg_create_physical_replication_slot('standby1_slot');
 
-- Logical slot
SELECT pg_create_logical_replication_slot('cdc_slot', 'pgoutput');
 
-- Monitor slot lag - CRITICAL: unacknowledged slots cause WAL accumulation
SELECT
  slot_name,
  slot_type,
  active,
  pg_size_pretty(
    pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)
  ) AS lag
FROM pg_replication_slots;

Unused or stalled slots accumulate WAL on the primary indefinitely. Set max_slot_wal_keep_size (Postgres 13+) to cap WAL retention. A disconnected logical replication subscriber with an active slot is one of the most common causes of disk exhaustion on a primary.

-- Set a WAL retention limit for slots (postgresql.conf)
-- max_slot_wal_keep_size = '10GB'
 
-- Drop a stale slot that's accumulating WAL
SELECT pg_drop_replication_slot('stale_slot');

Decision Matrix

flowchart TD Q1{Need full cluster copy\nfor HA / failover?} Q1 -->|Yes| PHYS[Physical streaming\nreplication] Q1 -->|No| Q2{Cross-version,\nselective tables,\nor CDC?} Q2 -->|Yes| LOG[Logical replication] Q2 -->|No| Q3{Read replicas only,\nsame version?} Q3 -->|Yes| PHYS Q3 -->|No| LOG PHYS --> Q4{Reduce primary\nWAL fanout?} Q4 -->|Yes| CAS[Cascading standbys] Q4 -->|No| PHYS

Monitoring Replication Health

-- Primary: view all connected standbys
SELECT
  client_addr,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;
 
-- Standby: check lag from primary's perspective
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

Alert on replay_lag_bytes exceeding a threshold and on pg_last_xact_replay_timestamp() going stale. These two metrics together distinguish a lagging standby from a disconnected one.

Key Takeaways

Physical replication copies the entire cluster byte-for-byte and is the right foundation for HA failover; it cannot replicate across major versions or select individual tables.
Logical replication operates at the row level, supports cross-version sync and selective table replication, but does not propagate DDL—schema changes require manual coordination.
Synchronous replication (RPO = 0) adds commit latency proportional to the round-trip to the nearest synchronous standby; measure and budget for this before enabling it.
Replication slots prevent WAL recycling before a consumer catches up—unused or stalled slots cause unbounded WAL accumulation on the primary; monitor slot lag aggressively.
Cascading standbys reduce primary WAL fanout for geo-distributed deployments at the cost of additional lag for the cascading node.
Always alert on both replay_lag_bytes and pg_last_xact_replay_timestamp() staleness; the combination distinguishes slow consumers from disconnected ones.

Series

Postgres for Backend Engineers

Part 5 of 10

← Part 4

Partitioning for Real Workloads

Part 6 →

Logical Decoding and CDC