Skip to main content
Postgres for Backend Engineers

Query Planning Intuition

Ravinder··7 min read
PostgresDatabaseSQLQuery PlanningPerformance
Share:
Query Planning Intuition

Postgres does not accept query hints. There is no USE INDEX, no FORCE ORDER, no NOLOCK. The planner is a cost-based optimizer that generates a plan using its own cost model—and your only legitimate levers are the statistics the planner reads, the configuration parameters that calibrate its cost model, and the schema decisions you make. Engineers who fight the planner with workarounds lose. Engineers who understand what the planner is actually computing can consistently guide it toward good plans.

How the Planner Works

The planner's job is to enumerate possible execution strategies for a query and select the one with the lowest estimated total cost. Cost is measured in abstract units that represent I/O and CPU work, calibrated by seq_page_cost, random_page_cost, cpu_tuple_cost, and related parameters.

flowchart TD SQL[SQL Query] --> Parse[Parser\ntokenize + AST] Parse --> Rewrite[Rewriter\nview expansion · rules] Rewrite --> Plan[Planner/Optimizer] Plan --> Stats[pg_statistic\nrow counts · histograms · MCV] Plan --> Cost[Cost model\nseq_page_cost · random_page_cost] Plan --> BestPlan[Best Plan] BestPlan --> Exec[Executor] style Stats fill:#eff6ff,stroke:#3b82f6 style Cost fill:#fef3c7,stroke:#d97706

The planner consults pg_statistic (populated by ANALYZE) for:

  • n_distinct: estimated number of distinct values
  • MCV (Most Common Values): top-N values and their frequencies
  • histogram bounds: approximate distribution of remaining values
  • correlation: physical ordering correlation with heap order

When statistics are stale or missing, the planner falls back to hardcoded defaults—and those defaults often produce wildly wrong row estimates.

Reading EXPLAIN ANALYZE

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, count(o.id) AS order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.email;

The critical numbers in each node:

-> Hash Join  (cost=1234.56..5678.90 rows=842 width=48)
              (actual time=12.345..34.567 rows=1089 loops=1)
   Buffers: shared hit=234 read=56
  • cost=startup..total: planner's estimate. startup is time to first row; total is time to last row.
  • actual time=startup..total: wall-clock milliseconds. Divide by loops for per-loop time.
  • rows: planner estimate vs actual. A large discrepancy (>10×) indicates bad statistics.
  • Buffers: shared hit=N read=M: hit = from cache; read = from disk. High read with available RAM indicates cache thrashing or a cold buffer pool.

The most important diagnostic: estimated rows vs actual rows. When these diverge badly, the planner chose a plan optimized for the wrong cardinality.

Join Strategies

Postgres has three join algorithms. The planner picks based on estimated cardinality:

-- Nested Loop: good for small outer relation, indexed inner
-- Sequential outer scan, index seek per outer row
-- Cost: O(outer_rows × index_lookup_cost)
 
-- Hash Join: good for large, roughly equal relations
-- Build a hash table from the smaller side; probe with the larger
-- Cost: O(N + M) with memory overhead for the hash table
 
-- Merge Join: good when both sides are already sorted on the join key
-- Cost: O(N log N + M log M) with sort, O(N + M) if pre-sorted
-- Force a specific join type for testing (never for production)
SET enable_hashjoin = off;
SET enable_mergejoin = off;
-- Now only nested loop is available - compare plans
 
-- Reset
RESET enable_hashjoin;
RESET enable_mergejoin;

Disabling join types is a diagnostic tool, not a tuning strategy. If the planner consistently picks the wrong join, the problem is statistics, not the planner's logic.

Statistics and ANALYZE

-- Check when a table was last analyzed
SELECT relname, last_analyze, last_autoanalyze, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'orders';
 
-- Manual analyze for a specific table (runs quickly)
ANALYZE orders;
 
-- Increase statistics target for a column with high cardinality
-- Default is 100 histogram buckets; raise for skewed distributions
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
ANALYZE orders;
 
-- Check what statistics Postgres has collected
SELECT
  attname,
  n_distinct,
  correlation,
  array_length(most_common_vals, 1) AS mcv_count
FROM pg_stats
WHERE tablename = 'orders';

If a column has a skewed distribution (most values are one thing, with a long tail), the planner's default statistics may underestimate the tail. Raising the statistics target gives the planner more histogram buckets and more MCV entries.

Calibrating the Cost Model

The default cost parameters assume spinning disks with 4:1 random-to-sequential penalty. SSDs and NVMe storage change this ratio dramatically.

-- For NVMe/SSD: random reads are much cheaper
-- In postgresql.conf or per-session:
SET random_page_cost = 1.1;   -- default is 4.0; SSDs are 1.1-2.0
SET seq_page_cost = 1.0;      -- usually leave at 1.0
 
-- effective_cache_size tells the planner how much OS buffer cache
-- is likely available. Higher value biases toward index scans.
SET effective_cache_size = '24GB';  -- set to ~75% of available RAM
 
-- parallel_tuple_cost / parallel_setup_cost affect parallel query decisions
SET parallel_tuple_cost = 0.1;
SET parallel_setup_cost = 1000;

The most impactful single change for modern infrastructure: set random_page_cost = 1.1 on SSD/NVMe instances. This alone often eliminates spurious sequential scans on tables that would benefit from index scans.

Extended Statistics

When two columns are correlated, the planner assumes independence by default and underestimates how selective a multi-column filter is:

-- Without extended statistics: planner assumes status and region are independent
-- Actual selectivity: WHERE status = 'shipped' AND region = 'us' affects 5% of rows
-- Planner estimate might be 40% × 30% = 12% — wrong
 
-- Create extended statistics to capture correlation
CREATE STATISTICS orders_status_region (dependencies)
  ON status, region
  FROM orders;
 
ANALYZE orders;
 
-- Also useful: ndistinct for correlated distinct counts
CREATE STATISTICS orders_user_status (ndistinct, dependencies)
  ON user_id, status
  FROM orders;

Extended statistics are available from Postgres 10+ and are one of the most underused features for fixing bad row estimates on multi-column filters.

Common Plan Problems and Fixes

-- Problem: seq scan on large table despite index existing
-- Diagnosis: check if the index is actually being considered
EXPLAIN SELECT * FROM events WHERE user_id = 42;
 
-- If planner estimates too many rows matching, it prefers seq scan
-- Check: actual selectivity
SELECT count(*) FROM events WHERE user_id = 42;
SELECT count(*) FROM events;
-- If ratio is low and planner still seq scans: stale statistics
ANALYZE events;
 
-- Problem: index not used because of implicit cast
-- This does NOT use an index on user_id (bigint) if $1 is text
SELECT * FROM events WHERE user_id = '42';  -- implicit cast
-- Fix: cast in the query or ensure parameter types match
SELECT * FROM events WHERE user_id = 42::bigint;
 
-- Problem: function in WHERE defeats index use
-- Bad: index on created_at not used
SELECT * FROM events WHERE date_trunc('day', created_at) = '2025-07-20';
-- Good: direct range scan, uses index
SELECT * FROM events
WHERE created_at >= '2025-07-20' AND created_at < '2025-07-21';

Key Takeaways

  • The planner is a cost-based optimizer, not a rule engine; accurate statistics and correctly calibrated cost parameters produce good plans—workarounds do not.
  • The most diagnostic signal in EXPLAIN ANALYZE is the gap between estimated rows and actual rows; a 10× discrepancy always warrants investigating statistics freshness.
  • Set random_page_cost = 1.1 on SSD/NVMe instances and effective_cache_size to ~75% of RAM; these two changes have the broadest positive impact on plan quality.
  • Raise STATISTICS targets on high-cardinality or skewed columns; create extended statistics for correlated column pairs to fix multi-column filter estimates.
  • Functions in WHERE clauses defeat index use—rewrite range conditions directly on the column rather than wrapping the column in a function.
  • Disabling join methods (enable_hashjoin, enable_mergejoin) is a diagnostic tool to understand the planner's alternatives, not a production tuning strategy.
Share: