Why I wrote this: Building client-side JSON tools on aijsons.com means I live in parsed objects — but production data still lands in PostgreSQL JSONB columns on the backends I integrate with. A webhook ingestion table storing raw event payloads as JSONB looked fine at 200k rows. At 4.2 million rows, a filter on payload->>'event_type' jumped from 40ms to 8.4 seconds. The fix was not "add an index" generically — it was choosing between GIN, expression B-tree, and partial indexes based on how we actually queried the column. This guide is the decision tree I use before touching CREATE INDEX.
JSON vs JSONB: Indexing Starts at Column Type
PostgreSQL offers json (text storage, preserves whitespace) and jsonb (binary, decomposed, faster operators). You cannot effectively index json columns for containment queries. Every project I have seen store API payloads uses jsonb. If you inherit a json column, migrate first:
ALTER TABLE events
ALTER COLUMN payload TYPE jsonb USING payload::jsonb;
Migration on a large table requires a lock and rewrite — schedule it, do not run casually in production Friday afternoon.
Three Query Patterns, Three Index Strategies
JSONB indexing is not one-size-fits-all. Match the index to the operator in your WHERE clause:
| Query pattern | Example | Index type |
|---|---|---|
| Key existence | payload ? 'user_id' | GIN (default ops) |
| Containment | payload @> '{"status":"active"}' | GIN (jsonb_ops or jsonb_path_ops) |
| Scalar equality on one field | payload->>'event_type' = 'purchase' | B-tree on expression |
| Nested path | payload #>> '{metadata,region}' = 'eu-west' | B-tree on expression (or generated column) |
The 8.4-second query was pattern #3. A default GIN index on the whole column did not help because PostgreSQL cannot use GIN for arbitrary ->> text comparisons the way it uses B-tree for indexed expressions.
Pattern 1: GIN Index for Containment
When you filter with @>, ?&, or ?|, create a GIN index on the column:
CREATE INDEX idx_events_payload_gin
ON events USING GIN (payload);
Query:
SELECT id, created_at
FROM events
WHERE payload @> '{"event_type": "purchase", "status": "completed"}'
AND created_at > now() - interval '7 days';
Benchmark (4.2M rows, PostgreSQL 16, SSD):
- Seq Scan: 6.1s
- GIN Index Scan: 12ms
For read-heavy workloads with broad containment filters, GIN is the right default. Trade-off: GIN indexes are large (often 2–3× the JSONB column size) and slow to update on high-write tables.
jsonb_path_ops: Smaller Index, Narrower Use
CREATE INDEX idx_events_payload_path
ON events USING GIN (payload jsonb_path_ops);
jsonb_path_ops supports only @> (not ? key exists). The index is ~30% smaller in my tests. Use it when every query is containment-based and you never check key existence alone.
Pattern 2: Expression B-tree for Scalar Filters
This fixed our production incident. We filtered constantly on event_type extracted as text:
-- Before: sequential scan on 4.2M rows
EXPLAIN ANALYZE
SELECT count(*)
FROM events
WHERE payload->>'event_type' = 'purchase';
-- Seq Scan on events (cost=0..892341 rows=4200000)
-- Execution Time: 8432 ms
Expression index:
CREATE INDEX idx_events_event_type
ON events ((payload->>'event_type'));
Same query after index:
-- Index Scan using idx_events_event_type (cost=0..124 rows=89000)
-- Execution Time: 38 ms
220× faster — not because JSONB got smarter, because the planner could seek instead of scan.
Partial Index When Cardinality Is Skewed
90% of our queries filtered event_type = 'purchase'. A partial index saved space:
CREATE INDEX idx_events_purchase_only
ON events ((payload->>'event_type'), created_at DESC)
WHERE payload->>'event_type' = 'purchase';
Index size dropped from 890MB to 94MB. Queries for other event types still seq-scan — acceptable because they run rarely.
Pattern 3: Generated Columns (PostgreSQL 12+)
Expression indexes work but obscure intent in ORMs. Generated columns expose extracted fields as real columns:
ALTER TABLE events
ADD COLUMN event_type text
GENERATED ALWAYS AS (payload->>'event_type') STORED;
CREATE INDEX idx_events_event_type_col ON events (event_type);
Benefits: ORMs map event_type like any string column; composite indexes with created_at are straightforward. Cost: extra storage per row and write overhead on INSERT/UPDATE when payload changes.
Validating Payload Shape Before It Hits the Database
Indexing cannot fix malformed JSON. We validate webhook bodies against a JSON Schema in the ingestion service before INSERT. During development I paste payloads into the JSON Validator and JSON Schema Generator on aijsons.com — same checks, browser-only, no data leaves the machine. Catching event_type as a number instead of string at the edge prevents silent query mismatches where = 'purchase' returns zero rows.
EXPLAIN Workflow I Use Every Time
- Run EXPLAIN (ANALYZE, BUFFERS) on the slow query in a read replica or off-peak window
- Identify the scan type — Seq Scan on a million-row table is the signal
- Match operator to index type using the table above
- CREATE INDEX CONCURRENTLY in production to avoid write locks:
CREATE INDEX CONCURRENTLY idx_events_event_type
ON events ((payload->>'event_type'));
- Re-run EXPLAIN — confirm Index Scan, compare Execution Time
- UPDATE statistics if the planner still ignores the index:
ANALYZE events;
Common Mistakes
1. Indexing the Wrong Extraction Operator
payload->'event_type' returns JSONB; payload->>'event_type' returns text. The index expression must match the query exactly — including quotes and cast:
-- Query uses text comparison
WHERE (payload->>'event_type') = 'purchase'
-- Index MUST use ->>, not ->
CREATE INDEX ... ON events ((payload->>'event_type'));
2. GIN When You Need B-tree
Adding GIN because "JSONB needs GIN" without checking operators wastes disk and write throughput. I have removed orphan GIN indexes that never appeared in pg_stat_user_indexes.
3. Forgetting Composite Queries
If every query filters event_type AND created_at, a composite index beats two single-column indexes:
CREATE INDEX idx_events_type_created
ON events ((payload->>'event_type'), created_at DESC);
4. jsonb_path_ops for ? Queries
jsonb_path_ops does not support the ? operator. Queries like WHERE payload ? 'user_id' need default jsonb_ops.
Monitoring Index Health
SELECT indexrelname, idx_scan, idx_tup_read, pg_size_pretty(pg_relation_size(indexrelid))
FROM pg_stat_user_indexes
WHERE schemaname = 'public' AND relname = 'events'
ORDER BY idx_scan DESC;
Indexes with idx_scan = 0 after 30 days are candidates for removal. JSONB GIN indexes that are never scanned are expensive dead weight on write-heavy tables.
When NOT to Index JSONB
- Cold archive tables — monthly partitions queried once for compliance exports do not need indexes; seq scan is fine.
- Write-heavy, rarely read — GIN maintenance cost exceeds query savings.
- Full-document retrieval only — if you always SELECT the whole payload by primary key, JSONB is just a blob store; index the PK, not the JSON.
- Consider normalization — if you always filter on five fields, a proper relational schema may outperform any JSONB index strategy.
Key Takeaways
- Use
jsonb, notjson, when you need indexes. - GIN for containment (
@>); B-tree expression indexes for scalar->>filters. - Partial indexes when one value dominates query traffic.
- Always validate with EXPLAIN ANALYZE before and after — numbers beat assumptions.
- Validate payload shape at ingestion; use client-side schema tools during development.
- Monitor
pg_stat_user_indexes— JSONB GIN indexes you do not use still cost writes.