Skip to main content
Postgres for Backend Engineers

JSONB Without Regret

Ravinder··6 min read
PostgresDatabaseSQLJSONBSchema Design
Share:
JSONB Without Regret

JSONB gets reached for too early and too often. It shows up in schemas as a catch-all metadata column that starts with three keys and ends with forty, half of which are never queried, none of which have types enforced. It also gets avoided too reflexively—engineers who remember RDBMS orthodoxy see any JSON column as a schema smell and normalize data that genuinely has variable structure. The truth is narrower: JSONB is the right tool in a specific set of situations, and using it well requires understanding both the storage model and the operator set.

JSON vs JSONB: The Practical Difference

Postgres ships two JSON types. json stores the input text verbatim and re-parses it on every access. jsonb decomposes the document into a binary representation at insert time, enabling operator support, indexing, and significantly faster reads. The only reason to use json over jsonb today is if preserving key order or duplicate keys matters to you—which is almost never a legitimate requirement.

-- jsonb: binary storage, indexable, operator-rich
CREATE TABLE events (
  id         bigserial PRIMARY KEY,
  occurred_at timestamptz NOT NULL DEFAULT now(),
  payload    jsonb NOT NULL
);
 
-- Insertion is slightly slower (parsing + decomposition)
-- Reads are faster; operators and indexes work
INSERT INTO events (payload) VALUES
  ('{"type": "page_view", "url": "/pricing", "user_id": 42}');

When JSONB Is the Right Choice

The legitimate JSONB use cases share a common characteristic: the structure of the data is genuinely variable or controlled externally.

flowchart TD Q{Is the column structure variable?} Q -->|Yes - user-defined fields, plugin data| JSONB[Use JSONB] Q -->|Yes - external API payload storage| JSONB Q -->|No - known fields, all rows| COLS[Use typed columns] JSONB --> IDX{Do you query specific keys?} IDX -->|Yes - equality/containment| GIN[GIN index on column] IDX -->|Yes - one key heavily| EXPR[Expression index on key] IDX -->|No - store only| NONE[No index needed] COLS --> NORM[Normalize and constrain]

Good fits: audit log payloads, webhook event bodies, user-defined metadata fields, feature flag overrides per tenant, EAV-style attributes where the attribute set varies per entity type.

Bad fits: columns where every row has the same three fields (just add three columns), relationships that need joins (extract to a proper table), data that needs range queries across keys (JSONB range queries are expensive and not indexable with GIN).

Core Operators

-- Field access: -> returns jsonb, ->> returns text
SELECT payload -> 'user_id' AS user_id_jsonb,
       payload ->> 'user_id' AS user_id_text
FROM events;
 
-- Nested access
SELECT payload -> 'address' -> 'city' AS city,
       payload #>> '{address,zip}' AS zip
FROM users;
 
-- Containment: does this jsonb contain this sub-document?
SELECT * FROM events
WHERE payload @> '{"type": "page_view"}';
 
-- Key existence
SELECT * FROM events WHERE payload ? 'error_code';
 
-- Any of these keys
SELECT * FROM events WHERE payload ?| ARRAY['error_code', 'warning_code'];
 
-- All of these keys
SELECT * FROM events WHERE payload ?& ARRAY['type', 'user_id'];

The @> (containment) operator is the critical one to understand: it checks whether the left jsonb document contains all key-value pairs of the right document. This is the operator that GIN indexes can accelerate.

Indexing JSONB

GIN on the Full Column

The standard approach when you query many different keys unpredictably:

CREATE INDEX idx_events_payload_gin
  ON events
  USING GIN (payload);
 
-- This query uses the GIN index
SELECT * FROM events WHERE payload @> '{"type": "purchase", "user_id": 42}';
 
-- Key existence also uses GIN
SELECT * FROM events WHERE payload ? 'error_code';

GIN on the full column indexes every key-value pair in the document. The trade-off is index size—a wide, varied JSONB document produces a large GIN index. Use jsonb_path_ops operator class for a smaller index that only supports @>:

CREATE INDEX idx_events_payload_gin_path
  ON events
  USING GIN (payload jsonb_path_ops);
-- Smaller index, only supports @> containment

Expression Index on a Specific Key

When one key is queried heavily with equality:

CREATE INDEX idx_events_user_id
  ON events ((payload ->> 'user_id'));
 
-- This query uses the expression index
SELECT * FROM events WHERE payload ->> 'user_id' = '42';
 
-- Typed cast for proper comparison
CREATE INDEX idx_events_user_id_int
  ON events (((payload ->> 'user_id')::bigint));
 
SELECT * FROM events WHERE (payload ->> 'user_id')::bigint = 42;

Expression indexes are dramatically smaller than a full-column GIN index and support range queries. Prefer them when you have a small number of heavily queried keys.

Mutation Operators

-- Merge/upsert a sub-document (|| is the concatenation operator)
UPDATE users
SET preferences = preferences || '{"theme": "dark", "notifications": true}'::jsonb
WHERE id = 42;
 
-- Remove a key
UPDATE users
SET preferences = preferences - 'legacy_setting'
WHERE id = 42;
 
-- Set a nested path
UPDATE users
SET preferences = jsonb_set(preferences, '{notifications,email}', 'false')
WHERE id = 42;
 
-- Remove nested path
UPDATE users
SET preferences = preferences #- '{address,legacy_field}'
WHERE id = 42;

Every JSONB update rewrites the entire column value—Postgres has no partial JSONB update. For frequently mutated large documents, consider whether the write amplification warrants breaking the structure into typed columns or a separate child table.

Schema Discipline

JSONB flexibility is exactly what makes it dangerous at scale. Enforce discipline with check constraints:

-- Require specific keys to exist
ALTER TABLE events
  ADD CONSTRAINT events_payload_has_type
  CHECK (payload ? 'type');
 
-- Validate the type field is a known value
ALTER TABLE events
  ADD CONSTRAINT events_payload_valid_type
  CHECK (payload ->> 'type' IN ('page_view', 'click', 'purchase', 'error'));
 
-- Enforce numeric type for a key
ALTER TABLE events
  ADD CONSTRAINT events_user_id_numeric
  CHECK (
    NOT (payload ? 'user_id')
    OR (payload -> 'user_id') IS NOT NULL
    AND jsonb_typeof(payload -> 'user_id') = 'number'
  );

Document the expected shape in a schema registry or a README beside the migration that introduced the column. Six months after the column was created, no one remembers which keys are meaningful.

Aggregation and Transformation

-- Aggregate JSONB arrays
SELECT jsonb_agg(payload ORDER BY occurred_at) AS events
FROM events
WHERE payload ->> 'user_id' = '42';
 
-- Expand JSONB to rows
SELECT key, value
FROM events,
     jsonb_each(payload)
WHERE id = 100;
 
-- Expand to typed columns (jsonb_to_record)
SELECT *
FROM events,
     jsonb_to_record(payload) AS r(type text, user_id bigint, url text)
LIMIT 10;

Key Takeaways

  • Use jsonb (not json) in all new tables—binary storage, richer operators, and indexable; json is a legacy type.
  • JSONB is appropriate for genuinely variable schemas, external payloads, and user-defined fields; fixed known fields belong in typed columns.
  • GIN on the full column supports @> and ? across any key; use jsonb_path_ops for a smaller index when only containment queries are needed.
  • Expression indexes on specific keys beat full GIN for equality and range queries on a small set of known hot keys.
  • Every JSONB mutation rewrites the whole column value—heavy write workloads on large documents can amplify I/O; measure before assuming JSONB is write-cheap.
  • Enforce structure with CHECK constraints and document the expected shape; without discipline, a JSONB column becomes an unqueryable blob within a year.
Share: