SQL Medium Asked at AmazonAsked at MetaAsked at StripeAsked at Spotify

How do you use EXPLAIN / EXPLAIN ANALYZE to diagnose a slow query?

For Data Engineer Data Analyst Data Scientist

The short answer

EXPLAIN shows the optimizer's chosen plan with estimated rows and costs before execution. EXPLAIN ANALYZE runs the query and overlays actual row counts and timing, letting you spot where estimates diverge from reality — bad statistics, missing indexes, or the wrong join algorithm — and target the correct fix.

How to think about it

The interviewer is really asking: “Do you fix slow queries by guessing, or do you read evidence?” EXPLAIN is the evidence. The plan is a tree — every node represents one operation. You read strategy top-down, and you find bottlenecks bottom-up (the innermost node runs first).

The two-command workflow

Start with EXPLAIN alone to see the optimizer’s plan without paying the runtime cost. Then switch to EXPLAIN ANALYZE to overlay real numbers — on a replica, or wrapped in a transaction you immediately roll back for mutating statements.

-- Safe way to EXPLAIN ANALYZE a DELETE without actually deleting:
BEGIN;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
DELETE FROM events WHERE created_at < now() - interval '90 days';
ROLLBACK;

For a SELECT, just run it directly:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
  AND o.created_at > now() - interval '7 days';

Reading the output

Hash Join  (cost=312..8900 rows=1200 width=36)
           (actual rows=1187 time=14..28 ms)
  Hash Cond: (o.customer_id = c.id)
  ->  Index Scan using idx_orders_status_ts on orders o
        Index Cond: (status = 'pending' AND created_at > ...)
        (actual rows=1210 time=1..12 ms)
  ->  Hash  (actual rows=50000 time=8..8 ms)
        ->  Seq Scan on customers c

How to read this tree

Hash Join            ← outer node; runs last
  Index Scan         ← left input; runs first (fast — uses index)
  Hash → Seq Scan    ← right input; builds hash table (50 k rows)

Work bottom-up to find where time goes. The Seq Scan on customers reads 50,000 rows into memory every time — is that unavoidable, or is a filter missing?

The five fields that matter

Field	What to look for
`rows` estimate vs actual	Diverge by 10x+? Stale stats — run `ANALYZE tablename`
Node type	`Seq Scan` on a large table with a selective filter = missing index
Join type	`Nested Loop` with a large outer set = bad estimate drove the wrong join
`Buffers: hit` vs `read`	High `read` = data cold in cache or larger than `shared_buffers`
`actual time` on each node	The node with the largest time is your bottleneck

Matching findings to fixes

-- Finding: rows estimate 10, actual 500 000 → stale stats
ANALYZE orders;

-- Finding: Seq Scan on orders with selective WHERE → missing index
CREATE INDEX ON orders (status, created_at);

-- Finding: skewed column estimates still wrong after ANALYZE
-- → extended statistics (PostgreSQL 10+)
CREATE STATISTICS orders_status_x_created ON status, created_at FROM orders;
ANALYZE orders;

Five-step reading checklist

Locate the node with the highest actual time.
Compare its estimated rows vs actual rows.
If estimates are badly off → ANALYZE the table, then re-check.
If the node is Seq Scan on a large table with a selective WHERE → add or tune the index.
If the join type is Nested Loop on two large sets → the planner was fooled by a bad estimate; fixing stats usually changes it to Hash Join automatically.