What is an anti-join, and when does NOT IN behave differently from NOT EXISTS?
An anti-join returns rows from the left table that have no match in the right table. NOT IN and NOT EXISTS both express this, but NOT IN returns zero rows — not an error, just silently empty — when the subquery contains even a single NULL, because SQL cannot determine whether the outer value equals a NULL.
How to think about it
This is a Hard question because the interviewer wants to catch something very specific: do you know that NOT IN silently returns zero rows when NULLs are present? Most candidates know what an anti-join is. The ones who pass this question can explain why NOT IN breaks and demonstrate they reach for NOT EXISTS by default.
The three anti-join patterns
There are three ways to write an anti-join, and they are not equivalent when NULLs are involved:
-- Pattern 1: NOT IN (dangerous with NULLs)
SELECT id FROM orders
WHERE customer_id NOT IN (SELECT id FROM vip_customers);
-- Pattern 2: NOT EXISTS (NULL-safe, recommended)
SELECT o.id FROM orders o
WHERE NOT EXISTS (
SELECT 1 FROM vip_customers v WHERE v.id = o.customer_id
);
-- Pattern 3: LEFT JOIN ... IS NULL (NULL-safe, often fastest)
SELECT o.id
FROM orders o
LEFT JOIN vip_customers v ON v.id = o.customer_id
WHERE v.id IS NULL;
Why NOT IN breaks on NULLs
SQL evaluates x NOT IN (1, 2, NULL) as x != 1 AND x != 2 AND x != NULL. The last comparison evaluates to UNKNOWN — not FALSE — because any comparison to NULL is unknown in SQL’s three-valued logic. UNKNOWN in a WHERE clause suppresses the row. If any row in the subquery has a NULL, the entire NOT IN returns empty.
This is the playground to make that concrete. The vip_customers table has a NULL id on the last row. Watch what happens:
The key insight: three-valued logic
-- This returns ZERO rows even though 5 is clearly not in the list
SELECT 1 WHERE 5 NOT IN (1, 2, NULL);
SQL doesn’t know whether 5 equals NULL — it returns UNKNOWN, and UNKNOWN rows are excluded. The moment one NULL creeps into the subquery, all of NOT IN’s results are suppressed.
When to use each
| Pattern | Use when |
|---|---|
NOT IN | Subquery column is declared NOT NULL and list is small/static |
NOT EXISTS | Subquery column is nullable; complex correlated condition |
LEFT JOIN ... IS NULL | You also need columns from the right table, or prefer set-based style |