Skip to main content
Postgres for Backend Engineers

Postgres on Kubernetes

Ravinder··6 min read
PostgresDatabaseSQLKubernetesOperations
Share:
Postgres on Kubernetes

Running Postgres on Kubernetes is entirely feasible and increasingly common—but it requires a fundamentally different mental model than running stateless services. Kubernetes is built for pods that die and respawn without consequences. Postgres is a process that owns durable state, cares deeply about which node it runs on, and has specific requirements around storage I/O, network identity, and shutdown sequencing. Bridging those two worlds is what a Postgres operator does, and understanding what operators handle—and what they don't—is prerequisite knowledge before deploying Postgres into a cluster.

Why Databases Are Different in Kubernetes

A stateless application pod failing and rescheduling onto another node is a normal, handled event. A Postgres pod doing the same thing requires:

  • The PVC (PersistentVolumeClaim) to follow the pod to the new node—or the new node to have access to the same volume
  • The replication topology to recognize the restart and resume WAL streaming
  • The primary election process to ensure only one pod is writable at a time
  • The connection routing layer to update after a failover
flowchart TD OPS[Operator\nCRD controller] -->|manages| STS[StatefulSet\npod-0 · pod-1 · pod-2] STS -->|pod-0 primary| PVC0[(PVC 0\ndata)] STS -->|pod-1 standby| PVC1[(PVC 1\ndata)] STS -->|pod-2 standby| PVC2[(PVC 2\ndata)] OPS -->|configures| SVC[Services\nread-write · read-only] SVC --> PGBOUNCER[PgBouncer\nor connection proxy] PGBOUNCER --> APP[Application] style OPS fill:#eff6ff,stroke:#3b82f6 style PVC0 fill:#fef3c7,stroke:#d97706

A raw StatefulSet with a PVC can run Postgres, but it handles none of the above. You'd be implementing HA, failover, backup, and monitoring yourself. That's what operators are for.

Operators: What They Provide

The mature options in 2025:

CloudNativePG (CNPG) is the most actively maintained open-source operator. It is a CNCF sandbox project, manages a single Postgres cluster per CR, handles streaming replication natively, integrates with Barman for backups, and supports in-place upgrades via switchover.

Zalando Postgres Operator (postgres-operator) is battle-tested at Zalando's scale. It uses Patroni for HA and Spilo as the container image, providing a familiar operational model for teams already using Patroni outside Kubernetes.

Percona Operator for PostgreSQL wraps Patroni and pgBackRest, adds pgBouncer integration, and focuses on enterprise operational requirements.

# CloudNativePG cluster definition
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: app-db
  namespace: production
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:16.3
 
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "512MB"
      effective_cache_size: "2GB"
      random_page_cost: "1.1"
      wal_level: "logical"
 
  storage:
    size: 100Gi
    storageClass: premium-rwo  # ReadWriteOnce, local SSD preferred
 
  backup:
    barmanObjectStore:
      destinationPath: "s3://my-bucket/postgres-backups"
      s3Credentials:
        accessKeyId:
          name: backup-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: backup-creds
          key: SECRET_ACCESS_KEY
    retentionPolicy: "30d"
 
  monitoring:
    enablePodMonitor: true  # Prometheus PodMonitor

Storage: The Most Important Decision

Storage class choice has more operational impact than any other configuration decision.

flowchart TD SC{Storage class type} SC -->|Local NVMe\nReadWriteOnce| LOCAL[Best I/O\nNode affinity pinned\nNode failure = manual recovery] SC -->|Network block\nReadWriteOnce| NET[Good I/O\nVolume follows pod\nSlower failover] SC -->|Shared filesystem\nReadWriteMany| NFS[Poor I/O\nDo not use for Postgres data] style LOCAL fill:#f0fdf4,stroke:#16a34a style NFS fill:#fef2f2,stroke:#ef4444

Key storage requirements:

  • ReadWriteOnce only—Postgres cannot safely share a volume between concurrent writers
  • fsync must work correctly—network filesystems that lie about fsync (some NFS configurations) cause data corruption
  • I/O latency matters—network block storage with >1ms average latency will hurt write performance; measure with fio before committing
# Benchmark your storage class before deploying Postgres
fio --name=randwrite --ioengine=libaio --iodepth=32 \
    --rw=randwrite --bs=4k --direct=1 --size=1G \
    --numjobs=4 --runtime=60 --filename=/data/fio_test
 
# What you want: >10k IOPS, <1ms avg latency for production OLTP
# Check: avg lat (usec)

Networking and Service Routing

CNPG creates two services automatically:

  • app-db-rw: always points to the current primary; use for writes
  • app-db-ro: round-robins across standbys; use for reads
  • app-db-r: points to all instances including primary
-- Application configuration (example: connection strings)
-- Write path
-- postgresql://user:pass@app-db-rw:5432/app
 
-- Read path (analytics, reporting)
-- postgresql://user:pass@app-db-ro:5432/app
 
-- Services update automatically on failover - no DNS TTL issues
-- because Kubernetes endpoint slices update within seconds

PgBouncer integration is available as a separate CR in CNPG:

apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: app-db-pooler-rw
spec:
  cluster:
    name: app-db
  instances: 2
  type: rw  # or 'ro' for read replica pooler
  pgbouncer:
    poolMode: transaction
    parameters:
      max_client_conn: "1000"
      default_pool_size: "25"

Failover Mechanics

When the primary pod dies (node failure, OOMKill, eviction), the operator:

  1. Detects the primary is gone (via leader election, usually etcd or Kubernetes leases)
  2. Elects the most advanced standby based on WAL position
  3. Promotes that standby to primary
  4. Updates the -rw service endpoints to point to the new primary
  5. Reconfigures remaining standbys to follow the new primary

This process typically takes 30–60 seconds end-to-end. Applications must handle transient connection failures (retry logic) to survive a failover transparently.

When Not to Use Kubernetes for Postgres

The honest answer: if you're not already committed to Kubernetes as your infrastructure platform and your team doesn't have deep k8s operational experience, managed Postgres (RDS, Cloud SQL, Azure Database for PostgreSQL) is the correct answer. The operational surface area of a Postgres operator on Kubernetes includes:

  • Kubernetes version upgrades affecting the operator
  • PVC resize operations (requires storage class support)
  • Debugging operator reconciliation loops
  • Managing backup retention and WAL archiving
  • Handling split-brain scenarios during network partitions

The incremental control you gain over a managed service needs to outweigh that operational cost. Common reasons it does: regulatory requirements to run in your own VPC, need for specific Postgres extensions not available in managed services, cost at scale, or multi-cluster/multi-region topologies that managed services don't support cleanly.

-- Extensions that often drive the Kubernetes decision:
-- PostGIS (geospatial) - available in RDS/Cloud SQL but with version lag
-- pg_cron (scheduled jobs) - not available in all managed services
-- timescaledb - available as managed (Timescale Cloud) or self-hosted
-- pgvector - now available in most managed services
-- Custom C extensions - not available in any managed service

Key Takeaways

  • Kubernetes operators (CloudNativePG, Zalando, Percona) handle the replication topology, failover, backup, and service routing that a raw StatefulSet cannot; do not run production Postgres on Kubernetes without an operator.
  • Storage class is the highest-impact configuration decision: use ReadWriteOnce local SSD when I/O performance is the priority, network block storage when node portability matters more; never use ReadWriteMany filesystems for Postgres data.
  • Failover takes 30–60 seconds end-to-end under a well-configured operator; applications must implement connection retry logic to survive failover transparently.
  • The -rw and -ro services in CNPG update automatically on failover—route writes to -rw and reads to -ro to get automatic read/write splitting and failover routing.
  • If your team lacks deep Kubernetes operational experience, managed Postgres (RDS, Cloud SQL) reduces operational surface area at the cost of less control; that trade-off is often correct.
  • The legitimate reasons to self-host Postgres on Kubernetes: specific extensions unavailable in managed services, regulatory VPC requirements, cost at scale, or multi-cluster topologies managed services don't support.
Share: