How do you write a non-equi join (range join), and what are the performance implications?
A non-equi join uses inequality operators (BETWEEN, >=, <, !=) in the ON clause instead of or in addition to equality. They are correct and valid SQL, but they prevent hash-join and merge-join plans, often forcing a nested-loop join that scales quadratically — so they require careful indexing or pre-filtering at scale.
How to think about it
What the question is really testing
Non-equi joins are rare enough that many engineers panic when they see one in the wild. The question tests two things: can you write the syntax correctly, and do you understand the performance trap that comes with it? Senior-level answers always address the performance implication unprompted.
The syntax is simple — a range condition in ON
Instead of ON a.id = b.id, you use inequality operators: BETWEEN, >=, <, or !=.
Use case 1: slowly changing dimension (SCD Type 2 price lookup)
-- Find the product price that was active at the time of each sale
SELECT s.sale_id, s.product_id, s.sale_date, p.price
FROM sales s
JOIN price_history p
ON s.product_id = p.product_id -- equality narrows first
AND s.sale_date BETWEEN p.valid_from AND p.valid_to; -- range picks the slice
Each sale row finds the one price record whose validity window contains the sale date.
Use case 2: overlapping interval detection
-- Find meeting pairs that overlap in the same room
SELECT a.meeting_id, b.meeting_id AS overlaps_with
FROM meetings a
JOIN meetings b
ON a.room_id = b.room_id
AND a.meeting_id < b.meeting_id -- avoid self-pair and duplicates
AND a.start_ts < b.end_ts -- a starts before b ends
AND a.end_ts > b.start_ts; -- a ends after b starts
The predicate a_start < b_end AND a_end > b_start is the standard interval-overlap test.
Try it live — price lookup at sale time
The performance story — what separates junior from senior answers
A regular equi-join (ON a.id = b.id) lets the engine build a hash table on one side and look up the other side in O(1). Range conditions break that — the engine cannot build a useful hash on a range predicate.
The result: a nested-loop join — O(M × N). For a 10M-row table, that is 100 trillion comparisons.
The mitigation: always lead with an equality condition on a selective column. In the price-history example, ON s.product_id = p.product_id lets the engine hash-join on product_id first, then apply the range as a cheap filter on the much smaller matching set.
-- Good: equality first → hash join on product_id, range filter on result
ON s.product_id = p.product_id
AND s.sale_date BETWEEN p.valid_from AND p.valid_to
-- Dangerous: range only → full nested-loop on both tables
ON s.sale_date BETWEEN p.valid_from AND p.valid_to
What to say in an interview
State the use case first — SCD Type 2 lookups, interval overlap detection, bucketing. Then proactively address the performance question: mention the nested-loop risk and the equality-prefix mitigation. Interviewers who ask this want to know whether you will break production, not just whether you can write the syntax.