datarekha
SQL Medium Asked at AmazonAsked at Google

How do you join tables on multiple keys, and why is the key order in a composite index important?

The short answer

You combine conditions in the ON clause with AND to join on multiple columns, which is necessary when no single column is a unique identifier across both tables. For index performance, the most selective column — or the column used in equality predicates — should come first in a composite index.

How to think about it

Multi-key joins are common in partitioned tables, slowly changing dimensions, and any schema where no single column uniquely identifies a row. The interviewer is checking whether you know when a composite key is necessary — and how index column order affects performance.

The key principle: all conditions in the ON clause must match simultaneously. There is no partial credit — one mismatched column means no join for that row pair.

Basic syntax

SELECT s.revenue, b.budget
FROM sales s
JOIN budgets b
  ON s.region     = b.region
 AND s.product_id = b.product_id
 AND s.year       = b.year;

All three conditions must match simultaneously. One mismatched column means no join — there is no partial matching in an AND chain.

Mixing equality and range conditions

You can mix equality and inequality predicates, but only equality conditions are used for an index seek on most engines; range conditions narrow the scan from that point.

-- match on customer + status; restrict to events in a time window
SELECT a.session_id, b.event_id
FROM sessions a
JOIN events b
  ON a.customer_id = b.customer_id
 AND a.status      = b.status
 AND b.event_ts BETWEEN a.start_ts AND a.end_ts;

Composite index column order

An index on (region, product_id, year) can satisfy:

  • Lookup on region alone
  • Lookup on region + product_id
  • Lookup on all three

It cannot efficiently satisfy a lookup on product_id alone — the leading column must be present for the index to be useful.

-- This index is well-ordered for the join above:
CREATE INDEX idx_budgets_composite ON budgets (region, product_id, year);

NULL behaviour on multi-key joins

NULL never equals NULL in SQL. If both sides have NULL in a join column, those rows will not match — even though the NULLs “represent the same unknown value.” For surrogate-key joins this is rarely a problem, but for natural keys sourced from upstream data it is worth explicitly filtering out NULLs or using IS NOT DISTINCT FROM (PostgreSQL/BigQuery) if you need NULL-equal semantics.

Try it yourself

Run the multi-key join below. Notice that row 3 in sales (region=‘West’, product=2, year=2023) has no budget counterpart — it is silently dropped. Change to LEFT JOIN to see it appear with NULLs.

What to mention in an interview

  • State the cardinality of each column and explain why the composite key is necessary.
  • Mention that OR in an ON clause is rarely what you want and often destroys index usage.
  • Offer to show EXPLAIN output if asked about performance.
  • Composite index column order matters for performance — the leading column must appear in the predicate for the index to be useful.
Learn it properly INNER JOIN

Keep practising

All SQL questions

Explore further

Skip to content