datarekha
SQL Hard Asked at AmazonAsked at Airbnb

How do you write a non-equi join (range join), and what are the performance implications?

The short answer

A non-equi join uses inequality operators (BETWEEN, >=, <, !=) in the ON clause instead of or in addition to equality. They are correct and valid SQL, but they prevent hash-join and merge-join plans, often forcing a nested-loop join that scales quadratically — so they require careful indexing or pre-filtering at scale.

How to think about it

What the question is really testing

Non-equi joins are rare enough that many engineers panic when they see one in the wild. The question tests two things: can you write the syntax correctly, and do you understand the performance trap that comes with it? Senior-level answers always address the performance implication unprompted.

The syntax is simple — a range condition in ON

Instead of ON a.id = b.id, you use inequality operators: BETWEEN, >=, <, or !=.

Use case 1: slowly changing dimension (SCD Type 2 price lookup)

-- Find the product price that was active at the time of each sale
SELECT s.sale_id, s.product_id, s.sale_date, p.price
FROM sales s
JOIN price_history p
  ON s.product_id = p.product_id          -- equality narrows first
 AND s.sale_date BETWEEN p.valid_from AND p.valid_to;  -- range picks the slice

Each sale row finds the one price record whose validity window contains the sale date.

Use case 2: overlapping interval detection

-- Find meeting pairs that overlap in the same room
SELECT a.meeting_id, b.meeting_id AS overlaps_with
FROM meetings a
JOIN meetings b
  ON a.room_id    = b.room_id
 AND a.meeting_id < b.meeting_id     -- avoid self-pair and duplicates
 AND a.start_ts   < b.end_ts         -- a starts before b ends
 AND a.end_ts     > b.start_ts;      -- a ends after b starts

The predicate a_start < b_end AND a_end > b_start is the standard interval-overlap test.

Try it live — price lookup at sale time

The performance story — what separates junior from senior answers

A regular equi-join (ON a.id = b.id) lets the engine build a hash table on one side and look up the other side in O(1). Range conditions break that — the engine cannot build a useful hash on a range predicate.

The result: a nested-loop join — O(M × N). For a 10M-row table, that is 100 trillion comparisons.

The mitigation: always lead with an equality condition on a selective column. In the price-history example, ON s.product_id = p.product_id lets the engine hash-join on product_id first, then apply the range as a cheap filter on the much smaller matching set.

-- Good: equality first → hash join on product_id, range filter on result
ON s.product_id = p.product_id
AND s.sale_date BETWEEN p.valid_from AND p.valid_to

-- Dangerous: range only → full nested-loop on both tables
ON s.sale_date BETWEEN p.valid_from AND p.valid_to

What to say in an interview

State the use case first — SCD Type 2 lookups, interval overlap detection, bucketing. Then proactively address the performance question: mention the nested-loop risk and the equality-prefix mitigation. Interviewers who ask this want to know whether you will break production, not just whether you can write the syntax.

Learn it properly INNER JOIN

Keep practising

All SQL questions

Explore further

Skip to content