Pagination Patterns That Survive at Scale
Offset Pagination Feels Right Until It Is Not
LIMIT 20 OFFSET 1000 looks harmless. It reads naturally. It maps cleanly to "page 51 of results." Clients can jump to arbitrary pages. Product managers love it. The SQL is three words.
It is also, past a certain scale, quietly catastrophic. The database scans and discards the first 1,000 rows on every request for page 51. As your dataset grows, deep pages get slower. As concurrent users hit different pages, your database does redundant work on every request. And if a row is inserted or deleted between page fetches, your pagination is inconsistent — items appear twice or disappear silently.
None of this is obvious from the API contract. The API looks fine. The client looks fine. The symptoms show up weeks later as slow queries that nobody can explain, and phantom data gaps that support tickets describe as "sometimes results just seem to skip things."
This post covers cursor pagination, keyset pagination, their tradeoffs, and the specific patterns for deep pagination and total-count problems that come up in every production system.
Why Offset Pagination Breaks at Scale
The mechanics are simple. OFFSET N in SQL does not skip N rows — it reads N rows and discards them. The database engine must scan the index from the beginning, count to row N, then return rows N+1 through N+page_size.
-- What you write
SELECT id, name, created_at
FROM orders
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000;
-- What the database actually does:
-- 1. Full index scan from the beginning
-- 2. Read 10,020 rows
-- 3. Discard the first 10,000
-- 4. Return the last 20
-- Cost: O(OFFSET + LIMIT), not O(LIMIT)At page 1: scan 20 rows. At page 500: scan 10,020 rows. At page 5,000: scan 100,020 rows. The cost grows linearly with page depth. On a 10-million row table, page 500,000 scans half the table for every request.
The consistency problem is equally serious:
Cursor Pagination: Opaque Tokens, Stable Pages
Cursor pagination solves the consistency problem by anchoring page fetches to a specific position in the dataset rather than a numeric offset. The cursor encodes where the previous page ended. The next page starts after that point.
// Cursor pagination — server implementation
import { Buffer } from "buffer";
interface PaginationCursor {
id: string;
createdAt: string;
}
function encodeCursor(cursor: PaginationCursor): string {
return Buffer.from(JSON.stringify(cursor)).toString("base64url");
}
function decodeCursor(token: string): PaginationCursor {
return JSON.parse(Buffer.from(token, "base64url").toString("utf8"));
}
async function listOrders(
after?: string,
limit: number = 20
): Promise<{ data: Order[]; nextCursor: string | null }> {
let query = db("orders").orderBy("created_at", "desc").orderBy("id", "desc");
if (after) {
const cursor = decodeCursor(after);
// Use a row-level comparison to skip past the cursor position
query = query.where(function () {
this.where("created_at", "<", cursor.createdAt).orWhere(function () {
this.where("created_at", "=", cursor.createdAt).andWhere(
"id",
"<",
cursor.id
);
});
});
}
const rows = await query.limit(limit + 1); // fetch one extra to detect hasMore
const hasMore = rows.length > limit;
const data = hasMore ? rows.slice(0, limit) : rows;
const lastRow = data[data.length - 1];
const nextCursor =
hasMore && lastRow
? encodeCursor({ id: lastRow.id, createdAt: lastRow.createdAt })
: null;
return { data, nextCursor };
}The API contract looks like this:
// GET /orders?limit=20
{
"data": [...],
"pagination": {
"nextCursor": "eyJpZCI6IjEyMyIsImNyZWF0ZWRBdCI6IjIwMjUtMDktMTUifQ",
"hasMore": true
}
}
// GET /orders?after=eyJpZCI6IjEyMyIsImNyZWF0ZWRBdCI6IjIwMjUtMDktMTUifQ&limit=20Key properties of this design:
- The cursor is opaque to clients — they cannot construct or manipulate it
- Inserting or deleting rows does not cause duplicates or skips
- Cost is O(LIMIT) regardless of position — the database uses the index efficiently
- Clients cannot jump to an arbitrary page (this is a feature, not a bug)
Keyset Pagination: Explicit, Index-Friendly, and Debuggable
Keyset pagination is a variant of cursor pagination where the position is expressed as explicit field values rather than an opaque token. It is more debuggable (you can construct the query yourself) and slightly more flexible for APIs where clients need to understand the ordering.
-- Keyset pagination: O(1) regardless of depth
-- First page
SELECT id, title, created_at
FROM posts
WHERE status = 'published'
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Next page (keyset from last row of previous page: created_at='2025-09-14T12:00:00Z', id='abc123')
SELECT id, title, created_at
FROM posts
WHERE status = 'published'
AND (created_at, id) < ('2025-09-14T12:00:00Z', 'abc123')
ORDER BY created_at DESC, id DESC
LIMIT 20;The (created_at, id) < (...) syntax is a row value comparator. It works in PostgreSQL, MySQL 8+, and most modern databases. It requires a composite index on the same fields in the same order:
-- Required index for efficient keyset pagination
CREATE INDEX idx_posts_pagination
ON posts (status, created_at DESC, id DESC);
-- This index covers the WHERE clause and ORDER BY — no sort neededThe tradeoff: keyset pagination requires a stable sort order. If your sort field is not unique, you need a tiebreaker (usually the primary key). If clients need to sort by a non-indexed field, keyset pagination forces an index creation conversation — which is actually a good thing.
The Deep Pagination Problem: When Clients Actually Need It
Sometimes deep pagination is genuinely necessary. Bulk exports, data migrations, admin interfaces, and reporting jobs all need to traverse large datasets. The question is how to do it without destroying database performance.
For known sequential traversal: use keyset, full stop. Write a job that follows cursors until exhaustion. This is O(n) in data size, not O(n²) like offset.
# Bulk export using keyset pagination
import httpx
from typing import Iterator
def export_all_orders(api_base: str, token: str) -> Iterator[dict]:
cursor = None
while True:
params = {"limit": 500}
if cursor:
params["after"] = cursor
response = httpx.get(
f"{api_base}/orders",
params=params,
headers={"Authorization": f"Bearer {token}"}
)
response.raise_for_status()
body = response.json()
yield from body["data"]
cursor = body["pagination"].get("nextCursor")
if not cursor:
break
# Usage
for order in export_all_orders("https://api.example.com", "my-token"):
process(order)For user-facing "jump to page N": reconsider the requirement. In almost every case, users do not actually jump to page 500. They filter, search, or sort until they find what they need. If the UI requires arbitrary page jumps, that is a symptom of missing search functionality — add search and remove offset pagination.
If you genuinely need total count + offset (e.g., legacy UI): shard the count query.
-- Approximate count for large tables — fast
SELECT reltuples::bigint AS estimated_count
FROM pg_class
WHERE relname = 'orders';
-- Exact count with a timeout fallback
BEGIN;
SET statement_timeout = '500ms';
SELECT COUNT(*) FROM orders WHERE status = 'active';
COMMIT;
-- If it times out, return the approximate count insteadTotal-Count Problem: Stop Returning It by Default
Clients ask for total counts because they want to show "Showing 1-20 of 47,832 results." This is a UX pattern borrowed from desktop software. It costs a full table scan on every page request.
The alternatives are almost always acceptable:
| Pattern | UX copy | Query cost |
|---|---|---|
| Exact count | "1–20 of 47,832" | COUNT(*) — full scan |
| Estimated count | "~48,000 results" | pg_class lookup — O(1) |
| Has-more flag | "Load more" | Zero — detected via N+1 fetch |
| Cursor total absent | "Next page" | Zero |
Most product teams, when shown the query cost, immediately agree that "~48,000 results" is acceptable. The ones who insist on exact counts usually have a reporting requirement that should be a separate, asynchronous materialized view anyway.
// API response shapes — choose based on context
interface ListResponse<T> {
data: T[];
pagination:
| { type: "cursor"; nextCursor: string | null; hasMore: boolean }
| { type: "offset"; total: number; page: number; pageSize: number }
| { type: "keyset"; after: string | null; hasMore: boolean };
}
// Never include both cursor and total — pick your strategySearch vs List Pagination
Search results and list results have different pagination requirements, and conflating them is a common mistake.
List pagination (browsing a collection in a stable order) — use keyset or cursor. The dataset is stable enough that a cursor position is meaningful.
Search pagination (full-text search, faceted search, relevance-ranked results) — use offset or Elasticsearch's search_after. Search results are inherently unstable (relevance scores shift as data changes), so cursor consistency is a false promise anyway. Elasticsearch's search_after is the keyset equivalent for search:
// Elasticsearch search_after — equivalent to keyset for search
{
"query": { "match": { "title": "microservices" } },
"sort": [
{ "_score": "desc" },
{ "id": "asc" }
],
"size": 20,
"search_after": [0.94821, "post-789"]
}Migration Path: Moving from Offset to Cursor
If you have an existing API using offset pagination and want to migrate, do it in three steps without breaking clients.
Step 1: Add cursor-based endpoints alongside offset ones. Version them if needed. Document that offset endpoints are deprecated for large pages.
Step 2: Add client-detectable warnings. Return a response header when clients request pages beyond a threshold.
# Deprecation signal for deep offset pagination
def list_orders_offset(page: int, page_size: int = 20):
response = build_response(...)
if page > 100: # > 2,000 rows deep
response.headers["Deprecation"] = "true"
response.headers["Sunset"] = "2026-03-01"
response.headers["Link"] = '</orders/cursor>; rel="successor-version"'
return responseStep 3: Set a hard limit on OFFSET (e.g., max page 200). Clients hitting that limit must migrate to cursor. Return a 400 with a link to the cursor endpoint.
Key Takeaways
- Offset pagination scans and discards N rows for every page N request — cost grows linearly with depth, making it O(n²) for full dataset traversal and quietly slow past page 100–500.
- Offset pagination is also inconsistent: insertions and deletions between page fetches produce duplicate or missing rows with no error signal to the client.
- Cursor pagination anchors fetches to a position in the dataset rather than a numeric offset — O(limit) cost at any depth, with stable results even as data changes.
- Keyset pagination is the SQL-native form of cursor pagination — it uses row value comparators and composite indexes to achieve O(1) seek cost regardless of dataset depth.
- The total-count problem is almost never worth a full
COUNT(*)on every page request; estimated counts, has-more flags, or deferred async counts are almost always acceptable to product. - Search pagination (relevance-ranked) and list pagination (stable order) have different requirements — do not apply cursor consistency guarantees to search results where relevance shifts make them meaningless.