datarekha
SQL Medium Asked at GoogleAsked at StripeAsked at Airbnb

Why does NOT IN (subquery) return zero rows when the subquery contains a NULL, and how do you fix it?

The short answer

SQL uses three-valued logic: comparing any value to NULL yields UNKNOWN, not FALSE. NOT IN evaluates as NOT (a = v1 OR a = v2 OR ...), so a single NULL in the list makes the entire predicate UNKNOWN for every row, suppressing all results. Use NOT EXISTS or a LEFT JOIN anti-pattern instead.

How to think about it

What the question is really testing

This is one of the most dangerous silent bugs in SQL — the query runs without errors, looks plausible, and returns zero rows when it should return many. Interviewers ask it to check whether you understand SQL’s three-valued logic and whether you reach for safe patterns reflexively.

Why it breaks — step by step

Say you want customers who did NOT place an order this month:

SELECT customer_id
FROM customers
WHERE customer_id NOT IN (
    SELECT customer_id FROM orders WHERE order_month = 6
);

This looks correct. But if even one row in the orders subquery has a NULL customer_id, the engine expands the predicate like this:

NOT (customer_id = 1 OR customer_id = 2 OR customer_id = NULL)

The expression customer_id = NULL evaluates to UNKNOWN — not TRUE, not FALSE. Then:

  • x OR UNKNOWN = UNKNOWN (unless x is TRUE)
  • NOT UNKNOWN = UNKNOWN

SQL only passes rows where the WHERE predicate is TRUE. Since every row evaluates to UNKNOWN, all rows are filtered out — silently. No error, no warning, zero rows.

See the trap live

The first playground shows NOT IN with a NULL in the subquery — watch it return nothing.

Now fix it with NOT EXISTS — this returns customers 3 and 4 as expected:

The two safe alternatives

NOT EXISTS (preferred):

SELECT c.customer_id
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
      AND o.order_month = 6
);

LEFT JOIN anti-join:

SELECT c.customer_id
FROM customers c
LEFT JOIN orders o
    ON c.customer_id = o.customer_id
   AND o.order_month = 6
WHERE o.customer_id IS NULL;

Both are NULL-safe. NOT EXISTS short-circuits as soon as a match is found, making it often faster. The LEFT JOIN form is sometimes more familiar to data analysts coming from pandas.

A third option — adding AND customer_id IS NOT NULL to the subquery — works but is fragile. You have to remember to add it every time, and a future schema change (making the column nullable) will silently break it again.

Learn it properly Anti-joins

Keep practising

All SQL questions

Explore further

Skip to content