Postgres for Backend Engineers

Indexing Past B-tree

Ravinder·June 1, 2025·7 min read

PostgresDatabaseSQLIndexingPerformance

Series

Postgres for Backend Engineers

Part 1 of 10

Start of series

Part 2 →

Locking and Contention

Most engineers reach for a B-tree index reflexively—and for good reason. B-trees handle equality and range queries over orderable types efficiently, cover the majority of OLTP access patterns, and require no configuration beyond the column name. The problem surfaces when your workload drifts outside that comfort zone: full-text search, geometric proximity, sparse high-cardinality columns, monotonically growing timestamps across billions of rows. At that point the B-tree either produces a bloated index that slows writes more than it helps reads, or simply cannot express the query semantics you need.

This post walks through the index types Postgres provides beyond B-tree, with enough mechanical grounding to make them first-class tools rather than last-resort curiosities.

Why Index Internals Matter

Before diving into specific types, it helps to frame what an index actually is in Postgres: a separate on-disk structure that stores a partial projection of table data in a form optimized for a specific class of lookup. Each index type defines its own:

Access method — how Postgres traverses the structure during scans
Operator classes — which operators (=, <@, @@, &&) the index can satisfy
Write amplification — how much extra I/O each INSERT or UPDATE produces

flowchart LR Query --> Planner Planner -->|Chooses access path| IndexScan Planner -->|Fallback| SeqScan IndexScan --> AM[Access Method] AM --> BTree[B-tree\nequality · range] AM --> GIN[GIN\narrays · JSONB · FTS] AM --> GiST[GiST\ngeometry · ranges · FTS] AM --> BRIN[BRIN\ncorrelated ranges] AM --> Hash[Hash\nequality only]

Operator class mismatch is the silent killer: if the operators in your WHERE clause don't match the index's operator class, Postgres will skip the index entirely with no warning.

GIN — Generalized Inverted Index

GIN is the right choice when a single column contains multiple indexable items and queries need to find rows where the column contains one or more of those items. The canonical examples are:

text[] or integer[] array columns queried with @>, &&
tsvector columns for full-text search with @@
jsonb columns queried with @> or ?

GIN stores a posting list per element—every distinct element in the column maps to the set of CTIDs (row pointers) containing it. Lookup is an inverted-index scan: find the element, fetch the posting list, return matching rows.

-- Full-text search: GIN over tsvector
CREATE INDEX idx_articles_fts
  ON articles
  USING GIN (to_tsvector('english', title || ' ' || body));
 
-- Query that uses it
SELECT id, title
FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'postgres & indexing');
 
-- Array containment: tag filtering
CREATE INDEX idx_posts_tags
  ON posts
  USING GIN (tags);
 
SELECT id FROM posts WHERE tags @> ARRAY['postgres', 'performance'];

Write trade-off. GIN maintains a pending list (a small heap) of new entries and merges them lazily. This keeps write amplification manageable for bulk loads, but vacuum is required to finalize merges. Set gin_pending_list_limit conservatively if vacuum is infrequent.

Covering vs. non-covering. GIN cannot be a covering index—it cannot include extra columns for an index-only scan. If you need to project additional columns, the query must still hit the heap.

GiST — Generalized Search Tree

GiST is a framework, not a single data structure. It exposes hooks (consistent, union, compress, decompress, penalty, picksplit) that extension authors implement to produce balanced tree access methods for arbitrary data types. This is how PostGIS, intarray, and the built-in range types all gain index support.

For backend engineers, the practical GiST cases are:

Geometric types (point, circle, polygon) with &&, @>, <-> (distance)
Range types (tsrange, daterange, int4range) with && and @>
Full-text search (though GIN is usually faster for FTS)

-- Schedule conflict detection with a GiST range index
CREATE TABLE bookings (
  id          bigserial PRIMARY KEY,
  resource_id int NOT NULL,
  during      tsrange NOT NULL,
  EXCLUDE USING GiST (resource_id WITH =, during WITH &&)
);
 
-- The exclusion constraint implicitly creates a GiST index.
-- This query uses it:
SELECT *
FROM bookings
WHERE resource_id = 42
  AND during && '[2025-06-10 09:00, 2025-06-10 17:00)'::tsrange;

GiST is lossy in some configurations: the index may return false positives that the executor filters with a recheck. This is expected behavior, not a bug.

BRIN — Block Range INdex

BRIN stores a summary (min, max, and optionally other statistics) per block range—by default 128 pages. It is orders of magnitude smaller than a B-tree (often 1/10,000th the size) and has nearly zero write overhead. The catch: it is only useful when the physical order of rows in the heap correlates with the indexed column's sort order.

The ideal BRIN candidate is an append-only table with a monotonically increasing column: event logs, time-series ingestion tables, audit trails.

-- Timeseries events table - created_at correlates strongly with heap order
CREATE TABLE events (
  id         bigserial PRIMARY KEY,
  created_at timestamptz NOT NULL DEFAULT now(),
  event_type text NOT NULL,
  payload    jsonb
);
 
CREATE INDEX idx_events_created_brin
  ON events
  USING BRIN (created_at)
  WITH (pages_per_range = 64);
 
-- Range scan: Postgres reads only the block ranges whose min/max overlap
SELECT * FROM events
WHERE created_at >= '2025-06-01' AND created_at < '2025-06-02';

If rows are inserted out of order—say, backfilled historical data—correlation drops and BRIN degrades to a near-full sequential scan. Check correlation with SELECT correlation FROM pg_stats WHERE tablename = 'events' AND attname = 'created_at'. Values above 0.9 are BRIN-friendly.

Partial Indexes

A partial index indexes only the rows satisfying a WHERE predicate. This yields a smaller, faster index for any workload where the query target is a subset of the table.

-- Only index unprocessed jobs - the hot working set
CREATE INDEX idx_jobs_unprocessed
  ON jobs (created_at)
  WHERE status = 'pending';
 
-- Query must include the same predicate for Postgres to use the index
SELECT * FROM jobs
WHERE status = 'pending'
  AND created_at < now() - interval '5 minutes';
 
-- Unique constraint on active records only
CREATE UNIQUE INDEX idx_users_active_email
  ON users (email)
  WHERE deleted_at IS NULL;

The planner will use a partial index only when the query's WHERE clause implies (is logically entailed by) the index predicate. The predicate must be identical or more restrictive—Postgres does not perform semantic simplification.

Expression Indexes

An expression index indexes the result of an expression rather than raw column values. This makes function results indexable without materializing them into a column.

-- Case-insensitive email lookup
CREATE INDEX idx_users_email_lower
  ON users (lower(email));
 
SELECT * FROM users WHERE lower(email) = lower('User@Example.com');
 
-- Computed JSON field - avoids redundant column
CREATE INDEX idx_orders_customer_id
  ON orders ((payload->>'customer_id'));
 
SELECT * FROM orders WHERE payload->>'customer_id' = '12345';
 
-- Date truncation for day-level bucketing
CREATE INDEX idx_events_day
  ON events (date_trunc('day', created_at));
 
SELECT count(*) FROM events
WHERE date_trunc('day', created_at) = '2025-06-01';

Expression indexes must exactly match the expression in the query. lower(email) and LOWER(email) are identical after parsing, but date_trunc('day', created_at) and created_at::date are not—they produce different values for non-midnight timestamps.

Choosing the Right Index Type

flowchart TD Q{What are you indexing?} Q -->|Scalar equality/range| BTree[B-tree] Q -->|Arrays, JSONB, FTS tokens| GIN Q -->|Geometry, ranges, exclusion| GiST Q -->|Ordered append-only column| BRIN Q -->|Subset of rows| Partial[Partial B-tree] Q -->|Function result| Expr[Expression index]

Before creating any index: run EXPLAIN (ANALYZE, BUFFERS) on the query, check pg_stat_user_indexes for existing unused indexes, and measure write amplification on a staging system with realistic data volumes.

Key Takeaways

B-tree covers equality and range on orderable scalars; every other index type fills a distinct gap—not a better B-tree.
GIN is the correct choice for multi-valued columns (arrays, JSONB, tsvector); expect larger index size and lazy merge overhead.
GiST enables geometric and range overlaps and powers exclusion constraints; it can be lossy and requires a recheck pass.
BRIN is tiny and fast for append-only, naturally ordered tables; correlation must be high or it degrades to a sequential scan.
Partial indexes shrink the index to the working set—pair them with matching WHERE predicates in queries.
Expression indexes must exactly mirror the expression in the query, including function argument literals.

Series

Postgres for Backend Engineers

Part 1 of 10

Start of series

Part 2 →

Locking and Contention