Postgres on Kubernetes
← Part 9
Upgrades, Majors and Minors
Running Postgres on Kubernetes is entirely feasible and increasingly common—but it requires a fundamentally different mental model than running stateless services. Kubernetes is built for pods that die and respawn without consequences. Postgres is a process that owns durable state, cares deeply about which node it runs on, and has specific requirements around storage I/O, network identity, and shutdown sequencing. Bridging those two worlds is what a Postgres operator does, and understanding what operators handle—and what they don't—is prerequisite knowledge before deploying Postgres into a cluster.
Why Databases Are Different in Kubernetes
A stateless application pod failing and rescheduling onto another node is a normal, handled event. A Postgres pod doing the same thing requires:
- The PVC (PersistentVolumeClaim) to follow the pod to the new node—or the new node to have access to the same volume
- The replication topology to recognize the restart and resume WAL streaming
- The primary election process to ensure only one pod is writable at a time
- The connection routing layer to update after a failover
A raw StatefulSet with a PVC can run Postgres, but it handles none of the above. You'd be implementing HA, failover, backup, and monitoring yourself. That's what operators are for.
Operators: What They Provide
The mature options in 2025:
CloudNativePG (CNPG) is the most actively maintained open-source operator. It is a CNCF sandbox project, manages a single Postgres cluster per CR, handles streaming replication natively, integrates with Barman for backups, and supports in-place upgrades via switchover.
Zalando Postgres Operator (postgres-operator) is battle-tested at Zalando's scale. It uses Patroni for HA and Spilo as the container image, providing a familiar operational model for teams already using Patroni outside Kubernetes.
Percona Operator for PostgreSQL wraps Patroni and pgBackRest, adds pgBouncer integration, and focuses on enterprise operational requirements.
# CloudNativePG cluster definition
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: app-db
namespace: production
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:16.3
postgresql:
parameters:
max_connections: "200"
shared_buffers: "512MB"
effective_cache_size: "2GB"
random_page_cost: "1.1"
wal_level: "logical"
storage:
size: 100Gi
storageClass: premium-rwo # ReadWriteOnce, local SSD preferred
backup:
barmanObjectStore:
destinationPath: "s3://my-bucket/postgres-backups"
s3Credentials:
accessKeyId:
name: backup-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: backup-creds
key: SECRET_ACCESS_KEY
retentionPolicy: "30d"
monitoring:
enablePodMonitor: true # Prometheus PodMonitorStorage: The Most Important Decision
Storage class choice has more operational impact than any other configuration decision.
Key storage requirements:
- ReadWriteOnce only—Postgres cannot safely share a volume between concurrent writers
- fsync must work correctly—network filesystems that lie about fsync (some NFS configurations) cause data corruption
- I/O latency matters—network block storage with >1ms average latency will hurt write performance; measure with
fiobefore committing
# Benchmark your storage class before deploying Postgres
fio --name=randwrite --ioengine=libaio --iodepth=32 \
--rw=randwrite --bs=4k --direct=1 --size=1G \
--numjobs=4 --runtime=60 --filename=/data/fio_test
# What you want: >10k IOPS, <1ms avg latency for production OLTP
# Check: avg lat (usec)Networking and Service Routing
CNPG creates two services automatically:
app-db-rw: always points to the current primary; use for writesapp-db-ro: round-robins across standbys; use for readsapp-db-r: points to all instances including primary
-- Application configuration (example: connection strings)
-- Write path
-- postgresql://user:pass@app-db-rw:5432/app
-- Read path (analytics, reporting)
-- postgresql://user:pass@app-db-ro:5432/app
-- Services update automatically on failover - no DNS TTL issues
-- because Kubernetes endpoint slices update within secondsPgBouncer integration is available as a separate CR in CNPG:
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
name: app-db-pooler-rw
spec:
cluster:
name: app-db
instances: 2
type: rw # or 'ro' for read replica pooler
pgbouncer:
poolMode: transaction
parameters:
max_client_conn: "1000"
default_pool_size: "25"Failover Mechanics
When the primary pod dies (node failure, OOMKill, eviction), the operator:
- Detects the primary is gone (via leader election, usually etcd or Kubernetes leases)
- Elects the most advanced standby based on WAL position
- Promotes that standby to primary
- Updates the
-rwservice endpoints to point to the new primary - Reconfigures remaining standbys to follow the new primary
This process typically takes 30–60 seconds end-to-end. Applications must handle transient connection failures (retry logic) to survive a failover transparently.
When Not to Use Kubernetes for Postgres
The honest answer: if you're not already committed to Kubernetes as your infrastructure platform and your team doesn't have deep k8s operational experience, managed Postgres (RDS, Cloud SQL, Azure Database for PostgreSQL) is the correct answer. The operational surface area of a Postgres operator on Kubernetes includes:
- Kubernetes version upgrades affecting the operator
- PVC resize operations (requires storage class support)
- Debugging operator reconciliation loops
- Managing backup retention and WAL archiving
- Handling split-brain scenarios during network partitions
The incremental control you gain over a managed service needs to outweigh that operational cost. Common reasons it does: regulatory requirements to run in your own VPC, need for specific Postgres extensions not available in managed services, cost at scale, or multi-cluster/multi-region topologies that managed services don't support cleanly.
-- Extensions that often drive the Kubernetes decision:
-- PostGIS (geospatial) - available in RDS/Cloud SQL but with version lag
-- pg_cron (scheduled jobs) - not available in all managed services
-- timescaledb - available as managed (Timescale Cloud) or self-hosted
-- pgvector - now available in most managed services
-- Custom C extensions - not available in any managed serviceKey Takeaways
- Kubernetes operators (CloudNativePG, Zalando, Percona) handle the replication topology, failover, backup, and service routing that a raw StatefulSet cannot; do not run production Postgres on Kubernetes without an operator.
- Storage class is the highest-impact configuration decision: use ReadWriteOnce local SSD when I/O performance is the priority, network block storage when node portability matters more; never use ReadWriteMany filesystems for Postgres data.
- Failover takes 30–60 seconds end-to-end under a well-configured operator; applications must implement connection retry logic to survive failover transparently.
- The
-rwand-roservices in CNPG update automatically on failover—route writes to-rwand reads to-roto get automatic read/write splitting and failover routing. - If your team lacks deep Kubernetes operational experience, managed Postgres (RDS, Cloud SQL) reduces operational surface area at the cost of less control; that trade-off is often correct.
- The legitimate reasons to self-host Postgres on Kubernetes: specific extensions unavailable in managed services, regulatory VPC requirements, cost at scale, or multi-cluster topologies managed services don't support.
← Part 9
Upgrades, Majors and Minors