SQL Medium Asked at AmazonAsked at MetaAsked at Stripe

When should you use EXISTS, IN, or a JOIN for a semi-join, and what are the NULL-safety differences?

For Data Analyst Data Scientist Data Engineer

The short answer

EXISTS short-circuits as soon as one match is found and is NULL-safe; IN loads the full subquery result and returns no rows when the list contains a NULL; a JOIN can multiply rows if the right side has duplicates. For large datasets, EXISTS or a deduplicated JOIN is generally safest.

How to think about it

All three answer the same question — “does a matching row exist?” — but they part ways on NULLs and duplicates, and the interviewer is almost certainly going to push on the NULL edge. Make sure you reach it.

Take a concrete task: find all orders placed by VIP customers. IN loads the subquery into a set and checks membership. EXISTS runs the subquery per outer row and short-circuits at the first match — it never counts how many. A JOIN brings columns from both sides but can multiply rows if the right side has duplicate keys, so it usually needs a DISTINCT or GROUP BY.

A worked example — IN

-- orders for customers 101, 102, 103, 101, 104   |   VIPs: 101, 103
SELECT order_id, customer_id, amount
FROM orders
WHERE customer_id IN (SELECT customer_id FROM vip_customers);

order_id	customer_id	amount
1	101	500
3	103	800
4	101	200

The three orders belonging to VIPs (101 and 103) come back; 102 and 104 are dropped. EXISTS would return the exact same rows here — the difference is how: IN materialises {101, 103} once, EXISTS probes per order and stops at the first VIP match.

The NULL trap — NOT IN

This is where people stumble. Put one NULL in the subquery and NOT IN returns zero rows, silently. SQL expands x NOT IN (101, NULL) into x != 101 AND x != NULL, and x != NULL is UNKNOWN, so TRUE AND UNKNOWN = UNKNOWN — no row passes:

-- vip_customers contains 101 and a NULL
SELECT order_id FROM orders
WHERE customer_id NOT IN (SELECT customer_id FROM vip_customers);

(0 rows)

-- NOT EXISTS is NULL-safe — the correct anti-join
SELECT order_id FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM vip_customers v WHERE v.customer_id = o.customer_id);

order_id
2
3

Orders 2 and 3 (customers 102 and 103, neither a VIP) are the right answer — and NOT EXISTS returns them while NOT IN collapses to nothing over that single NULL.

Choosing between them

Small, stable, NULL-free subquery — IN is fine and reads cleanly.
Large subquery, or early termination matters — prefer EXISTS.
You need columns from both sides — JOIN, but add DISTINCT/GROUP BY if the right side isn’t unique on the key.
Any negative filter — always NOT EXISTS; never NOT IN against a nullable column.

Learn it properly Anti-joins

When should you use EXISTS, IN, or a JOIN for a semi-join, and what are the NULL-safety differences?

A worked example — IN

The NULL trap — NOT IN

Choosing between them

Keep practising

Explore further