Why does NOT IN (subquery) return zero rows when the subquery contains a NULL, and how do you fix it?
SQL uses three-valued logic: comparing any value to NULL yields UNKNOWN, not FALSE. NOT IN evaluates as NOT (a = v1 OR a = v2 OR ...), so a single NULL in the list makes the entire predicate UNKNOWN for every row, suppressing all results. Use NOT EXISTS or a LEFT JOIN anti-pattern instead.
How to think about it
What the question is really testing
This is one of the most dangerous silent bugs in SQL — the query runs without errors, looks plausible, and returns zero rows when it should return many. Interviewers ask it to check whether you understand SQL’s three-valued logic and whether you reach for safe patterns reflexively.
Why it breaks — step by step
Say you want customers who did NOT place an order this month:
SELECT customer_id
FROM customers
WHERE customer_id NOT IN (
SELECT customer_id FROM orders WHERE order_month = 6
);
This looks correct. But if even one row in the orders subquery has a NULL customer_id, the engine expands the predicate like this:
NOT (customer_id = 1 OR customer_id = 2 OR customer_id = NULL)
The expression customer_id = NULL evaluates to UNKNOWN — not TRUE, not FALSE. Then:
x OR UNKNOWN = UNKNOWN(unless x is TRUE)NOT UNKNOWN = UNKNOWN
SQL only passes rows where the WHERE predicate is TRUE. Since every row evaluates to UNKNOWN, all rows are filtered out — silently. No error, no warning, zero rows.
See the trap live
The first playground shows NOT IN with a NULL in the subquery — watch it return nothing.
Now fix it with NOT EXISTS — this returns customers 3 and 4 as expected:
The two safe alternatives
NOT EXISTS (preferred):
SELECT c.customer_id
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_month = 6
);
LEFT JOIN anti-join:
SELECT c.customer_id
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
AND o.order_month = 6
WHERE o.customer_id IS NULL;
Both are NULL-safe. NOT EXISTS short-circuits as soon as a match is found, making it often faster. The LEFT JOIN form is sometimes more familiar to data analysts coming from pandas.
A third option — adding AND customer_id IS NOT NULL to the subquery — works but is fragile. You have to remember to add it every time, and a future schema change (making the column nullable) will silently break it again.