JSONB Without Regret
JSONB gets reached for too early and too often. It shows up in schemas as a catch-all metadata column that starts with three keys and ends with forty, half of which are never queried, none of which have types enforced. It also gets avoided too reflexively—engineers who remember RDBMS orthodoxy see any JSON column as a schema smell and normalize data that genuinely has variable structure. The truth is narrower: JSONB is the right tool in a specific set of situations, and using it well requires understanding both the storage model and the operator set.
JSON vs JSONB: The Practical Difference
Postgres ships two JSON types. json stores the input text verbatim and re-parses it on every access. jsonb decomposes the document into a binary representation at insert time, enabling operator support, indexing, and significantly faster reads. The only reason to use json over jsonb today is if preserving key order or duplicate keys matters to you—which is almost never a legitimate requirement.
-- jsonb: binary storage, indexable, operator-rich
CREATE TABLE events (
id bigserial PRIMARY KEY,
occurred_at timestamptz NOT NULL DEFAULT now(),
payload jsonb NOT NULL
);
-- Insertion is slightly slower (parsing + decomposition)
-- Reads are faster; operators and indexes work
INSERT INTO events (payload) VALUES
('{"type": "page_view", "url": "/pricing", "user_id": 42}');When JSONB Is the Right Choice
The legitimate JSONB use cases share a common characteristic: the structure of the data is genuinely variable or controlled externally.
Good fits: audit log payloads, webhook event bodies, user-defined metadata fields, feature flag overrides per tenant, EAV-style attributes where the attribute set varies per entity type.
Bad fits: columns where every row has the same three fields (just add three columns), relationships that need joins (extract to a proper table), data that needs range queries across keys (JSONB range queries are expensive and not indexable with GIN).
Core Operators
-- Field access: -> returns jsonb, ->> returns text
SELECT payload -> 'user_id' AS user_id_jsonb,
payload ->> 'user_id' AS user_id_text
FROM events;
-- Nested access
SELECT payload -> 'address' -> 'city' AS city,
payload #>> '{address,zip}' AS zip
FROM users;
-- Containment: does this jsonb contain this sub-document?
SELECT * FROM events
WHERE payload @> '{"type": "page_view"}';
-- Key existence
SELECT * FROM events WHERE payload ? 'error_code';
-- Any of these keys
SELECT * FROM events WHERE payload ?| ARRAY['error_code', 'warning_code'];
-- All of these keys
SELECT * FROM events WHERE payload ?& ARRAY['type', 'user_id'];The @> (containment) operator is the critical one to understand: it checks whether the left jsonb document contains all key-value pairs of the right document. This is the operator that GIN indexes can accelerate.
Indexing JSONB
GIN on the Full Column
The standard approach when you query many different keys unpredictably:
CREATE INDEX idx_events_payload_gin
ON events
USING GIN (payload);
-- This query uses the GIN index
SELECT * FROM events WHERE payload @> '{"type": "purchase", "user_id": 42}';
-- Key existence also uses GIN
SELECT * FROM events WHERE payload ? 'error_code';GIN on the full column indexes every key-value pair in the document. The trade-off is index size—a wide, varied JSONB document produces a large GIN index. Use jsonb_path_ops operator class for a smaller index that only supports @>:
CREATE INDEX idx_events_payload_gin_path
ON events
USING GIN (payload jsonb_path_ops);
-- Smaller index, only supports @> containmentExpression Index on a Specific Key
When one key is queried heavily with equality:
CREATE INDEX idx_events_user_id
ON events ((payload ->> 'user_id'));
-- This query uses the expression index
SELECT * FROM events WHERE payload ->> 'user_id' = '42';
-- Typed cast for proper comparison
CREATE INDEX idx_events_user_id_int
ON events (((payload ->> 'user_id')::bigint));
SELECT * FROM events WHERE (payload ->> 'user_id')::bigint = 42;Expression indexes are dramatically smaller than a full-column GIN index and support range queries. Prefer them when you have a small number of heavily queried keys.
Mutation Operators
-- Merge/upsert a sub-document (|| is the concatenation operator)
UPDATE users
SET preferences = preferences || '{"theme": "dark", "notifications": true}'::jsonb
WHERE id = 42;
-- Remove a key
UPDATE users
SET preferences = preferences - 'legacy_setting'
WHERE id = 42;
-- Set a nested path
UPDATE users
SET preferences = jsonb_set(preferences, '{notifications,email}', 'false')
WHERE id = 42;
-- Remove nested path
UPDATE users
SET preferences = preferences #- '{address,legacy_field}'
WHERE id = 42;Every JSONB update rewrites the entire column value—Postgres has no partial JSONB update. For frequently mutated large documents, consider whether the write amplification warrants breaking the structure into typed columns or a separate child table.
Schema Discipline
JSONB flexibility is exactly what makes it dangerous at scale. Enforce discipline with check constraints:
-- Require specific keys to exist
ALTER TABLE events
ADD CONSTRAINT events_payload_has_type
CHECK (payload ? 'type');
-- Validate the type field is a known value
ALTER TABLE events
ADD CONSTRAINT events_payload_valid_type
CHECK (payload ->> 'type' IN ('page_view', 'click', 'purchase', 'error'));
-- Enforce numeric type for a key
ALTER TABLE events
ADD CONSTRAINT events_user_id_numeric
CHECK (
NOT (payload ? 'user_id')
OR (payload -> 'user_id') IS NOT NULL
AND jsonb_typeof(payload -> 'user_id') = 'number'
);Document the expected shape in a schema registry or a README beside the migration that introduced the column. Six months after the column was created, no one remembers which keys are meaningful.
Aggregation and Transformation
-- Aggregate JSONB arrays
SELECT jsonb_agg(payload ORDER BY occurred_at) AS events
FROM events
WHERE payload ->> 'user_id' = '42';
-- Expand JSONB to rows
SELECT key, value
FROM events,
jsonb_each(payload)
WHERE id = 100;
-- Expand to typed columns (jsonb_to_record)
SELECT *
FROM events,
jsonb_to_record(payload) AS r(type text, user_id bigint, url text)
LIMIT 10;Key Takeaways
- Use
jsonb(notjson) in all new tables—binary storage, richer operators, and indexable;jsonis a legacy type. - JSONB is appropriate for genuinely variable schemas, external payloads, and user-defined fields; fixed known fields belong in typed columns.
- GIN on the full column supports
@>and?across any key; usejsonb_path_opsfor a smaller index when only containment queries are needed. - Expression indexes on specific keys beat full GIN for equality and range queries on a small set of known hot keys.
- Every JSONB mutation rewrites the whole column value—heavy write workloads on large documents can amplify I/O; measure before assuming JSONB is write-cheap.
- Enforce structure with CHECK constraints and document the expected shape; without discipline, a JSONB column becomes an unqueryable blob within a year.