Skip to main content
Data

Postgres as a Vector Database, Honestly

Ravinder··7 min read
DataPostgrespgvectorVector SearchAI
Share:
Postgres as a Vector Database, Honestly

Every new AI-adjacent project eventually hits the same question: do we need a dedicated vector database, or can Postgres handle it? The honest answer is: Postgres can handle it — until it can't. Knowing exactly where that line sits will save you a painful migration later.

This post is not a tutorial on installing pgvector. It's a frank assessment of what pgvector does well, where it degrades, and what signals should push you toward a purpose-built store.

What pgvector Actually Gives You

pgvector adds a vector column type and two index strategies to Postgres. You get:

  • Exact nearest-neighbor search (no index, brute-force)
  • Approximate nearest-neighbor via IVFFlat or HNSW
  • Standard Postgres ACID guarantees, joins, and filtering
  • One fewer system to operate

The value proposition is real. If you have fewer than a few million vectors and your embedding dimension is ≤1536, pgvector will work fine and you won't need to babysit another cluster.

Index Types: IVFFlat vs HNSW

These two indexes make very different tradeoffs.

-- IVFFlat: partition vectors into lists, search a subset
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
 
-- HNSW: hierarchical graph, better recall at the cost of build time and RAM
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

IVFFlat splits your vectors into lists Voronoi cells at build time. At query time you probe probes cells (default 1). More probes = better recall, more latency. The index build is fast but recall degrades as your dataset grows unless you rebuild periodically (the centroids become stale).

HNSW builds a layered proximity graph. It has better recall than IVFFlat at equivalent latency, but the build is slower and the graph lives entirely in RAM. m controls graph connectivity (edges per node); ef_construction controls build-time search depth. Higher values improve recall but inflate memory.

graph TD A[Query Vector] --> B{Index Type} B --> C[IVFFlat] B --> D[HNSW] C --> E[Find nearest centroid] E --> F[Scan N lists] F --> G[Return top-k] D --> H[Enter graph at top layer] H --> I[Greedy descent through layers] I --> J[Beam search at bottom layer] J --> G G --> K[Post-filter by metadata]

A rough mental model for choosing:

Situation Pick
Dataset changes frequently IVFFlat (fast rebuilds)
Dataset is relatively static HNSW (better recall)
RAM is tight IVFFlat
You need >95% recall HNSW with high ef

Recall vs Latency: The Uncomfortable Curve

Neither index gives you 100% recall by default. With IVFFlat at probes=1 on a 1M-vector dataset, you might see 70–80% recall. Bump probes to 10 and you recover to ~95%, but latency climbs proportionally.

HNSW is more forgiving but still requires tuning ef_search at query time:

-- Set per-session for HNSW query-time search width
SET hnsw.ef_search = 100;
 
SELECT id, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

The problem is that ef_search is a session-level setting. In a connection pool, you have to set it on every connection or use a SET LOCAL inside a transaction. This is a friction point that purpose-built databases handle transparently.

Where Postgres Starts to Buckle

Be honest with yourself when you hit these signals:

1. Index build time becomes a deployment blocker

HNSW index builds on 10M+ vectors can take hours. You can't build concurrently (without careful coordination), and you can't update the index incrementally — new rows enter but the graph quality degrades until a rebuild.

2. RAM pressure

The HNSW graph for 1M vectors at dimension 1536 can consume 6–10 GB of RAM depending on m. This competes directly with Postgres's shared_buffers and your regular relational workload. You're now tuning memory for two different access patterns on one machine.

3. Filtered vector search kills index selectivity

This is the one that bites teams hardest. You want vectors for user_id = 42 that are similar to a query. Postgres will use the vector index and then filter, or the B-tree index and then scan — it rarely picks both optimally.

-- Postgres struggles to use both indexes efficiently
SELECT id, embedding <=> $1 AS dist
FROM documents
WHERE tenant_id = 42
  AND created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> $1
LIMIT 10;

Purpose-built vector DBs (Qdrant, Weaviate, Pinecone) solve this with native pre-filtering — they store metadata alongside vectors and filter before the ANN search. pgvector has no equivalent.

4. Dimension ceiling

pgvector caps dimensions at 2000. OpenAI's text-embedding-3-large is 3072 dimensions. You'd have to truncate, which costs recall.

5. Multi-tenancy at scale

Partitioning a vector table by tenant in Postgres works but adds operational weight. Each partition needs its own index. Queries across partitions don't benefit from ANN indexes. This is a known gap.

The Scale Ceiling in Practice

Based on real workloads, here's a rough ceiling chart:

Vectors    | Dimension | Workload          | pgvector verdict
-----------|-----------|-------------------|------------------
< 500K     | ≤ 1536    | Low QPS (<50/s)   | Excellent
500K–5M    | ≤ 1536    | Medium QPS        | Workable with tuning
5M–20M     | ≤ 1536    | High QPS (>200/s) | Struggling
> 20M      | Any       | Any               | Graduate
Any        | > 2000    | Any               | Not supported

These aren't hard limits from a spec sheet — they're where teams actually start paging their on-call.

When to Graduate

Leave pgvector when any of these are true:

  • You need filtered ANN with low latency (<50ms P99) on a large dataset
  • Your vector dimension exceeds 1536 and you're losing recall from truncation
  • HNSW index size is crowding out your working set
  • You need real-time index updates without rebuild windows
  • You're hitting multi-tenant isolation requirements that partition overhead can't meet

The migration path is not that painful. Your embeddings are just floats — export them, load them into Qdrant or Weaviate, update your query layer. Keep Postgres for the relational truth.

A Pragmatic Architecture

For most teams in 2026, this hybrid works well:

flowchart LR APP[Application] --> PG[(Postgres\nRelational + Metadata)] APP --> VDB[(Qdrant / Weaviate\nVector Index)] PG -- "entity IDs" --> APP VDB -- "top-k IDs + scores" --> APP APP -- "JOIN on IDs" --> PG

Run the ANN search in the vector store, get back IDs and scores, then JOIN against Postgres for the full record. This keeps your transactional data in Postgres (where it belongs) and offloads the ANN workload to a system designed for it.

Key Takeaways

  • pgvector is production-ready for datasets under ~5M vectors with moderate QPS — don't over-engineer from day one.
  • HNSW beats IVFFlat on recall for static datasets; IVFFlat wins when you rebuild frequently or need fast index builds.
  • Filtered vector search is pgvector's Achilles' heel — metadata pre-filtering is where purpose-built stores earn their keep.
  • The 2000-dimension cap is a hard blocker for modern large embedding models; plan for it.
  • RAM pressure from HNSW graphs competes with your relational workload on the same machine — monitor this actively.
  • Graduate to a dedicated vector store when P99 latency, dimension limits, or filtered search requirements exceed what Postgres can deliver cleanly.