Why I wrote this: Building client-side JSON tools on aijsons.com means I live in parsed objects — but production data still lands in PostgreSQL JSONB columns on the backends I integrate with. A webhook ingestion table storing raw event payloads as JSONB looked fine at 200k rows. At 4.2 million rows, a filter on payload->>'event_type' jumped from 40ms to 8.4 seconds. The fix was not "add an index" generically — it was choosing between GIN, expression B-tree, and partial indexes based on how we actually queried the column. This guide is the decision tree I use before touching CREATE INDEX.

JSON vs JSONB: Indexing Starts at Column Type

PostgreSQL offers json (text storage, preserves whitespace) and jsonb (binary, decomposed, faster operators). You cannot effectively index json columns for containment queries. Every project I have seen store API payloads uses jsonb. If you inherit a json column, migrate first:

ALTER TABLE events
  ALTER COLUMN payload TYPE jsonb USING payload::jsonb;

Migration on a large table requires a lock and rewrite — schedule it, do not run casually in production Friday afternoon.

Three Query Patterns, Three Index Strategies

JSONB indexing is not one-size-fits-all. Match the index to the operator in your WHERE clause:

Query patternExampleIndex type
Key existencepayload ? 'user_id'GIN (default ops)
Containmentpayload @> '{"status":"active"}'GIN (jsonb_ops or jsonb_path_ops)
Scalar equality on one fieldpayload->>'event_type' = 'purchase'B-tree on expression
Nested pathpayload #>> '{metadata,region}' = 'eu-west'B-tree on expression (or generated column)

The 8.4-second query was pattern #3. A default GIN index on the whole column did not help because PostgreSQL cannot use GIN for arbitrary ->> text comparisons the way it uses B-tree for indexed expressions.

Pattern 1: GIN Index for Containment

When you filter with @>, ?&, or ?|, create a GIN index on the column:

CREATE INDEX idx_events_payload_gin
  ON events USING GIN (payload);

Query:

SELECT id, created_at
FROM events
WHERE payload @> '{"event_type": "purchase", "status": "completed"}'
  AND created_at > now() - interval '7 days';

Benchmark (4.2M rows, PostgreSQL 16, SSD):

  • Seq Scan: 6.1s
  • GIN Index Scan: 12ms

For read-heavy workloads with broad containment filters, GIN is the right default. Trade-off: GIN indexes are large (often 2–3× the JSONB column size) and slow to update on high-write tables.

jsonb_path_ops: Smaller Index, Narrower Use

CREATE INDEX idx_events_payload_path
  ON events USING GIN (payload jsonb_path_ops);

jsonb_path_ops supports only @> (not ? key exists). The index is ~30% smaller in my tests. Use it when every query is containment-based and you never check key existence alone.

Pattern 2: Expression B-tree for Scalar Filters

This fixed our production incident. We filtered constantly on event_type extracted as text:

-- Before: sequential scan on 4.2M rows
EXPLAIN ANALYZE
SELECT count(*)
FROM events
WHERE payload->>'event_type' = 'purchase';

-- Seq Scan on events  (cost=0..892341 rows=4200000)
--   Execution Time: 8432 ms

Expression index:

CREATE INDEX idx_events_event_type
  ON events ((payload->>'event_type'));

Same query after index:

-- Index Scan using idx_events_event_type  (cost=0..124 rows=89000)
--   Execution Time: 38 ms

220× faster — not because JSONB got smarter, because the planner could seek instead of scan.

Partial Index When Cardinality Is Skewed

90% of our queries filtered event_type = 'purchase'. A partial index saved space:

CREATE INDEX idx_events_purchase_only
  ON events ((payload->>'event_type'), created_at DESC)
  WHERE payload->>'event_type' = 'purchase';

Index size dropped from 890MB to 94MB. Queries for other event types still seq-scan — acceptable because they run rarely.

Pattern 3: Generated Columns (PostgreSQL 12+)

Expression indexes work but obscure intent in ORMs. Generated columns expose extracted fields as real columns:

ALTER TABLE events
  ADD COLUMN event_type text
  GENERATED ALWAYS AS (payload->>'event_type') STORED;

CREATE INDEX idx_events_event_type_col ON events (event_type);

Benefits: ORMs map event_type like any string column; composite indexes with created_at are straightforward. Cost: extra storage per row and write overhead on INSERT/UPDATE when payload changes.

Validating Payload Shape Before It Hits the Database

Indexing cannot fix malformed JSON. We validate webhook bodies against a JSON Schema in the ingestion service before INSERT. During development I paste payloads into the JSON Validator and JSON Schema Generator on aijsons.com — same checks, browser-only, no data leaves the machine. Catching event_type as a number instead of string at the edge prevents silent query mismatches where = 'purchase' returns zero rows.

EXPLAIN Workflow I Use Every Time

  1. Run EXPLAIN (ANALYZE, BUFFERS) on the slow query in a read replica or off-peak window
  2. Identify the scan type — Seq Scan on a million-row table is the signal
  3. Match operator to index type using the table above
  4. CREATE INDEX CONCURRENTLY in production to avoid write locks:
CREATE INDEX CONCURRENTLY idx_events_event_type
  ON events ((payload->>'event_type'));
  1. Re-run EXPLAIN — confirm Index Scan, compare Execution Time
  2. UPDATE statistics if the planner still ignores the index:
ANALYZE events;

Common Mistakes

1. Indexing the Wrong Extraction Operator

payload->'event_type' returns JSONB; payload->>'event_type' returns text. The index expression must match the query exactly — including quotes and cast:

-- Query uses text comparison
WHERE (payload->>'event_type') = 'purchase'

-- Index MUST use ->>, not ->
CREATE INDEX ... ON events ((payload->>'event_type'));

2. GIN When You Need B-tree

Adding GIN because "JSONB needs GIN" without checking operators wastes disk and write throughput. I have removed orphan GIN indexes that never appeared in pg_stat_user_indexes.

3. Forgetting Composite Queries

If every query filters event_type AND created_at, a composite index beats two single-column indexes:

CREATE INDEX idx_events_type_created
  ON events ((payload->>'event_type'), created_at DESC);

4. jsonb_path_ops for ? Queries

jsonb_path_ops does not support the ? operator. Queries like WHERE payload ? 'user_id' need default jsonb_ops.

Monitoring Index Health

SELECT indexrelname, idx_scan, idx_tup_read, pg_size_pretty(pg_relation_size(indexrelid))
FROM pg_stat_user_indexes
WHERE schemaname = 'public' AND relname = 'events'
ORDER BY idx_scan DESC;

Indexes with idx_scan = 0 after 30 days are candidates for removal. JSONB GIN indexes that are never scanned are expensive dead weight on write-heavy tables.

When NOT to Index JSONB

  • Cold archive tables — monthly partitions queried once for compliance exports do not need indexes; seq scan is fine.
  • Write-heavy, rarely read — GIN maintenance cost exceeds query savings.
  • Full-document retrieval only — if you always SELECT the whole payload by primary key, JSONB is just a blob store; index the PK, not the JSON.
  • Consider normalization — if you always filter on five fields, a proper relational schema may outperform any JSONB index strategy.

Key Takeaways

  • Use jsonb, not json, when you need indexes.
  • GIN for containment (@>); B-tree expression indexes for scalar ->> filters.
  • Partial indexes when one value dominates query traffic.
  • Always validate with EXPLAIN ANALYZE before and after — numbers beat assumptions.
  • Validate payload shape at ingestion; use client-side schema tools during development.
  • Monitor pg_stat_user_indexes — JSONB GIN indexes you do not use still cost writes.

References