Partitioning for Real Workloads
Partitioning is one of those features that looks straightforward in the documentation and accumulates sharp edges in production. The core promise is real: splitting a logically single table into physical child tables lets Postgres skip entire chunks of data during scans, parallelize operations, and make data lifecycle management (dropping old months, for example) a metadata operation rather than a slow DELETE. The gotchas are real too—foreign keys, indexes, and sequences behave differently across a partitioned hierarchy, and choosing the wrong partition key turns pruning into a myth.
Declarative Partitioning Architecture
Postgres 10+ declarative partitioning has the parent table hold no rows. Every row belongs to a child partition. The parent acts as a routing layer for INSERT and a merge layer for SELECT.
Three strategies are available:
- RANGE: rows routed by a continuous range on the partition key (dates, IDs)
- LIST: rows routed by explicit enumeration of key values (region, status)
- HASH: rows distributed by hash of the key—useful when no natural range or list exists and you want even distribution
Range Partitioning: The Common Case
CREATE TABLE events (
id bigserial NOT NULL,
occurred_at timestamptz NOT NULL,
event_type text NOT NULL,
user_id bigint,
payload jsonb
) PARTITION BY RANGE (occurred_at);
-- Create monthly partitions
CREATE TABLE events_2025_06
PARTITION OF events
FOR VALUES FROM ('2025-06-01') TO ('2025-07-01');
CREATE TABLE events_2025_07
PARTITION OF events
FOR VALUES FROM ('2025-07-01') TO ('2025-08-01');
-- Default partition catches any row outside defined ranges
CREATE TABLE events_default
PARTITION OF events DEFAULT;Partition bounds are [FROM, TO) — inclusive lower, exclusive upper. Always create a DEFAULT partition so rows that miss all bounds don't error; you can attach a proper partition later and move those rows.
Partition Pruning
Pruning is the mechanism that makes partitioning useful. When the planner can determine at planning time (static pruning) or at execution time (dynamic pruning) that a WHERE clause excludes certain partitions, it skips them entirely.
-- This query touches only events_2025_06 - pruning works
EXPLAIN SELECT * FROM events
WHERE occurred_at >= '2025-06-01' AND occurred_at < '2025-07-01';
-- enable_partition_pruning must be on (default: on)
SHOW enable_partition_pruning;
-- Dynamic pruning: parameter binding still prunes at execution
PREPARE q(timestamptz, timestamptz) AS
SELECT * FROM events WHERE occurred_at >= $1 AND occurred_at < $2;Pruning fails silently when:
- The partition key column is wrapped in a function:
WHERE date_trunc('month', occurred_at) = '2025-06-01'— Postgres cannot map this back to partition bounds. Filter directly on the column. - You cast away the partition key type:
WHERE occurred_at::date = '2025-06-01'may or may not prune depending on the cast being immutable. - The query has no WHERE clause on the partition key at all — full scan of all partitions.
-- Bad: function wrapping defeats pruning
SELECT * FROM events WHERE date_trunc('month', occurred_at) = '2025-06-01';
-- Good: direct range on partition key
SELECT * FROM events
WHERE occurred_at >= '2025-06-01' AND occurred_at < '2025-07-01';Indexes on Partitioned Tables
Creating an index on the parent table automatically creates matching indexes on all existing and future partitions.
-- Creates the index on parent + all partitions
CREATE INDEX idx_events_user_id ON events (user_id);
-- Partition-local unique constraint must include the partition key
-- Global uniqueness across partitions requires application-layer enforcement
-- or a dedicated deduplication table
CREATE UNIQUE INDEX idx_events_id_occurred
ON events (id, occurred_at); -- occurred_at is the partition keyA plain UNIQUE (id) on a partitioned table is not supported unless id is also the partition key. This is the most common surprise: if you need globally unique IDs across partitions, use a sequence at the parent level and include the partition key in the unique index, or enforce uniqueness externally.
Foreign Keys and Partitions
As of Postgres 16, foreign keys from a non-partitioned table into a partitioned table are supported. Foreign keys from a partitioned table referencing another table have been supported since Postgres 12, but with restrictions.
-- Referencing a partitioned table (Postgres 16+)
CREATE TABLE sessions (
id bigserial PRIMARY KEY,
user_id bigint REFERENCES users(id), -- users is non-partitioned
event_id bigint -- no FK into partitioned events
);
-- Workaround for FK into partitioned table (pre-16):
-- Enforce referential integrity via triggers or application logicFor older Postgres versions, the practical approach is to enforce referential integrity via application code or a deferred trigger rather than a database-level FK into the partitioned table.
List and Hash Partitioning
-- List partitioning: route by explicit value set
CREATE TABLE orders (
id bigserial NOT NULL,
region text NOT NULL,
amount numeric NOT NULL
) PARTITION BY LIST (region);
CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('us-east', 'us-west');
CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('eu-west', 'eu-central');
CREATE TABLE orders_default PARTITION OF orders DEFAULT;
-- Hash partitioning: even distribution, no natural range
CREATE TABLE sessions (
id bigserial NOT NULL,
user_id bigint NOT NULL
) PARTITION BY HASH (user_id);
CREATE TABLE sessions_0 PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_1 PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_2 PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_3 PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 3);Data Lifecycle: Drop vs Detach
The most compelling operational benefit of partitioning is fast data expiry:
-- DROP is a metadata operation - instant regardless of row count
DROP TABLE events_2025_01;
-- DETACH preserves data, removes from parent routing
-- Use CONCURRENTLY to avoid locking the parent (Postgres 14+)
ALTER TABLE events DETACH PARTITION events_2025_01 CONCURRENTLY;
-- After detaching, archive or transform independently
COPY events_2025_01 TO '/archive/events_2025_01.csv';
DROP TABLE events_2025_01;Contrast this with DELETE FROM events WHERE occurred_at < '2025-02-01', which acquires row locks, generates WAL, bloats the table, and requires a subsequent VACUUM—often taking hours on a billion-row table.
Automating Partition Creation
New partitions must exist before rows arrive. A common pattern is a scheduled job (pg_cron, an external cron, or an application startup task) that creates the next N partitions:
-- Create next month's partition if it doesn't exist
DO $$
DECLARE
next_month_start date := date_trunc('month', now() + interval '1 month');
next_month_end date := next_month_start + interval '1 month';
partition_name text := 'events_' || to_char(next_month_start, 'YYYY_MM');
BEGIN
IF NOT EXISTS (
SELECT 1 FROM pg_class WHERE relname = partition_name
) THEN
EXECUTE format(
'CREATE TABLE %I PARTITION OF events FOR VALUES FROM (%L) TO (%L)',
partition_name, next_month_start, next_month_end
);
END IF;
END;
$$;Key Takeaways
- Partition pruning only works when the WHERE clause filters directly on the partition key column—wrapping the key in a function defeats pruning silently.
- Always create a DEFAULT partition; rows that miss all bounds fail with an error in its absence, and missing partitions are a production incident waiting to happen.
- Unique constraints on partitioned tables must include the partition key; global uniqueness across partitions requires external enforcement or application-layer logic.
- Foreign keys into partitioned tables require Postgres 16+; on earlier versions, enforce referential integrity at the application layer.
- DROP TABLE on a partition is a metadata operation regardless of size—this is the primary operational win over time-based DELETE for retention policies.
- Partition creation must be automated and proactive; rows arriving for an undefined partition range error immediately, so provision partitions ahead of the data.