datarekha
SQL Medium Asked at AmazonAsked at MetaAsked at StripeAsked at Spotify

How do you use EXPLAIN / EXPLAIN ANALYZE to diagnose a slow query?

The short answer

EXPLAIN shows the optimizer's chosen plan with estimated rows and costs before execution. EXPLAIN ANALYZE runs the query and overlays actual row counts and timing, letting you spot where estimates diverge from reality — bad statistics, missing indexes, or the wrong join algorithm — and target the correct fix.

How to think about it

The interviewer is really asking: “Do you fix slow queries by guessing, or do you read evidence?” EXPLAIN is the evidence. The plan is a tree — every node represents one operation. You read strategy top-down, and you find bottlenecks bottom-up (the innermost node runs first).

The two-command workflow

Start with EXPLAIN alone to see the optimizer’s plan without paying the runtime cost. Then switch to EXPLAIN ANALYZE to overlay real numbers — on a replica, or wrapped in a transaction you immediately roll back for mutating statements.

-- Safe way to EXPLAIN ANALYZE a DELETE without actually deleting:
BEGIN;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
DELETE FROM events WHERE created_at < now() - interval '90 days';
ROLLBACK;

For a SELECT, just run it directly:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
  AND o.created_at > now() - interval '7 days';

Reading the output

Hash Join  (cost=312..8900 rows=1200 width=36)
           (actual rows=1187 time=14..28 ms)
  Hash Cond: (o.customer_id = c.id)
  ->  Index Scan using idx_orders_status_ts on orders o
        Index Cond: (status = 'pending' AND created_at > ...)
        (actual rows=1210 time=1..12 ms)
  ->  Hash  (actual rows=50000 time=8..8 ms)
        ->  Seq Scan on customers c

The five fields that matter

FieldWhat to look for
rows estimate vs actualDiverge by 10x+? Stale stats — run ANALYZE tablename
Node typeSeq Scan on a large table with a selective filter = missing index
Join typeNested Loop with a large outer set = bad estimate drove the wrong join
Buffers: hit vs readHigh read = data cold in cache or larger than shared_buffers
actual time on each nodeThe node with the largest time is your bottleneck

Matching findings to fixes

-- Finding: rows estimate 10, actual 500 000 → stale stats
ANALYZE orders;

-- Finding: Seq Scan on orders with selective WHERE → missing index
CREATE INDEX ON orders (status, created_at);

-- Finding: skewed column estimates still wrong after ANALYZE
-- → extended statistics (PostgreSQL 10+)
CREATE STATISTICS orders_status_x_created ON status, created_at FROM orders;
ANALYZE orders;

Five-step reading checklist

  1. Locate the node with the highest actual time.
  2. Compare its estimated rows vs actual rows.
  3. If estimates are badly off → ANALYZE the table, then re-check.
  4. If the node is Seq Scan on a large table with a selective WHERE → add or tune the index.
  5. If the join type is Nested Loop on two large sets → the planner was fooled by a bad estimate; fixing stats usually changes it to Hash Join automatically.

Keep practising

All SQL questions

Explore further

Skip to content