SQL Hard Asked at AmazonAsked at Airbnb

How do you write a non-equi join (range join), and what are the performance implications?

The short answer

A non-equi join uses inequality operators (BETWEEN, >=, <, !=) in the ON clause instead of or in addition to equality. They are correct and valid SQL, but they prevent hash-join and merge-join plans, often forcing a nested-loop join that scales quadratically — so they require careful indexing or pre-filtering at scale.

How to think about it

Non-equi joins are rare enough that engineers freeze when one shows up in the wild. The question checks two things: can you write the syntax, and do you understand the performance trap that rides along with it? Senior answers raise the performance point unprompted.

The syntax is just a range condition in ON — inequality operators (BETWEEN, >=, <, !=) instead of, or alongside, equality. The classic use case is a slowly-changing-dimension price lookup: match each sale to the price record whose validity window contains the sale date.

A worked example — price active at sale time

SELECT s.sale_id, s.product_id, s.sale_date, p.price, s.qty,
       ROUND(p.price * s.qty, 2) AS line_total
FROM sales s
JOIN price_history p
  ON s.product_id = p.product_id                       -- equality narrows first
 AND s.sale_date BETWEEN p.valid_from AND p.valid_to    -- range picks the slice
ORDER BY s.sale_id;

sale_id	product_id	sale_date	price	qty	line_total
1	10	2024-01-15	9.99	3	29.97
2	10	2024-04-20	12.49	2	24.98
3	20	2024-02-10	4.5	5	22.5
4	20	2024-06-01	5.25	1	5.25

Each sale lands in exactly one price bracket. Sale 1 (Jan 15) falls in product 10’s first window at 9.99; sale 2 (Apr 20) crosses into the 12.49 window after the April 1 price change. The BETWEEN does the temporal slicing that a plain equi-join can’t express. (The same shape solves interval-overlap detection — a.start < b.end AND a.end > b.start is the standard overlap test.)

The performance story — what separates junior from senior

An equi-join lets the engine build a hash table on one side and probe it in O(1). A range predicate breaks that — you can’t hash a BETWEEN — so the planner falls back to a nested-loop join, O(M × N). On a 10-million-row table that’s 100 trillion comparisons.

The mitigation is to always lead with an equality on a selective column. In the query above, s.product_id = p.product_id lets the engine hash-join on product_id first, then apply the date range as a cheap filter on the small matching set:

-- Good: equality first -> hash join on product_id, range as a post-filter
ON s.product_id = p.product_id
AND s.sale_date BETWEEN p.valid_from AND p.valid_to

-- Dangerous: range only -> full nested loop over both tables
ON s.sale_date BETWEEN p.valid_from AND p.valid_to

Learn it properly INNER JOIN

How do you write a non-equi join (range join), and what are the performance implications?

A worked example — price active at sale time

The performance story — what separates junior from senior

Keep practising

Explore further