How do you join tables on multiple keys, and why is the key order in a composite index important?
You combine conditions in the ON clause with AND to join on multiple columns, which is necessary when no single column is a unique identifier across both tables. For index performance, the most selective column — or the column used in equality predicates — should come first in a composite index.
How to think about it
Multi-key joins are common in partitioned tables, slowly changing dimensions, and any schema where no single column uniquely identifies a row. The interviewer is checking whether you know when a composite key is necessary — and how index column order affects performance.
The key principle: all conditions in the ON clause must match simultaneously. There is no partial credit — one mismatched column means no join for that row pair.
Basic syntax
SELECT s.revenue, b.budget
FROM sales s
JOIN budgets b
ON s.region = b.region
AND s.product_id = b.product_id
AND s.year = b.year;
All three conditions must match simultaneously. One mismatched column means no join — there is no partial matching in an AND chain.
Mixing equality and range conditions
You can mix equality and inequality predicates, but only equality conditions are used for an index seek on most engines; range conditions narrow the scan from that point.
-- match on customer + status; restrict to events in a time window
SELECT a.session_id, b.event_id
FROM sessions a
JOIN events b
ON a.customer_id = b.customer_id
AND a.status = b.status
AND b.event_ts BETWEEN a.start_ts AND a.end_ts;
Composite index column order
An index on (region, product_id, year) can satisfy:
- Lookup on
regionalone - Lookup on
region + product_id - Lookup on all three
It cannot efficiently satisfy a lookup on product_id alone — the leading column must be present for the index to be useful.
-- This index is well-ordered for the join above:
CREATE INDEX idx_budgets_composite ON budgets (region, product_id, year);
NULL behaviour on multi-key joins
NULL never equals NULL in SQL. If both sides have NULL in a join column, those rows will not match — even though the NULLs “represent the same unknown value.” For surrogate-key joins this is rarely a problem, but for natural keys sourced from upstream data it is worth explicitly filtering out NULLs or using IS NOT DISTINCT FROM (PostgreSQL/BigQuery) if you need NULL-equal semantics.
Try it yourself
Run the multi-key join below. Notice that row 3 in sales (region=‘West’, product=2, year=2023) has no budget counterpart — it is silently dropped. Change to LEFT JOIN to see it appear with NULLs.
What to mention in an interview
- State the cardinality of each column and explain why the composite key is necessary.
- Mention that OR in an ON clause is rarely what you want and often destroys index usage.
- Offer to show
EXPLAINoutput if asked about performance. - Composite index column order matters for performance — the leading column must appear in the predicate for the index to be useful.