datarekha
SQL Medium Asked at AmazonAsked at StripeAsked at Shopify

Why does column order matter in a composite index, and what is the left-prefix rule?

The short answer

A composite index on (a, b, c) is sorted first by a, then b within each a value, then c. The optimizer can use the index for queries that filter on a, on a + b, or on a + b + c, but not on b alone or c alone, because those values are not contiguous in the index structure.

How to think about it

This question tests whether you understand the B-tree data structure underneath an index — not just the rule, but why the rule exists. The interviewer wants to hear you reason from the structure, not recite a mnemonic.

The phone book analogy

Think of a composite index like a phone book sorted by last name, then first name. You can look up everyone named “Smith” instantly. You can look up “Smith, John” instantly. But you cannot look up everyone named “John” without scanning every page — first names are scattered across every last-name group.

A composite index on (customer_id, status, created_at) is identical: values are sorted first by customer_id, then by status within each customer, then by created_at within each status. If you filter only on status, those rows are not physically contiguous in the index — the database has to scan everything.

Left-prefix rule in practice

CREATE INDEX idx_orders ON orders (customer_id, status, created_at);

-- Uses index: leading column present
SELECT * FROM orders WHERE customer_id = 42;

-- Uses index: first two columns present
SELECT * FROM orders WHERE customer_id = 42 AND status = 'shipped';

-- Uses index: all three columns
SELECT * FROM orders
WHERE customer_id = 42 AND status = 'shipped'
  AND created_at > '2024-01-01';

-- Does NOT use this index: leading column missing
SELECT * FROM orders WHERE status = 'shipped';

-- Partial use: stops at the range gap (status is skipped)
SELECT * FROM orders
WHERE customer_id = 42 AND created_at > '2024-01-01';
-- Engine uses the index only for customer_id, then filters created_at in memory

Choosing column order

  1. Equality columns first — they eliminate the most rows and keep the scan contiguous.
  2. Highest-cardinality equality column first — narrows the key space fastest.
  3. Range column last — once you hit a range predicate, the index cannot skip further; put the range at the end so equality columns do maximum filtering first.
  4. ORDER BY / GROUP BY columns match — if you sort by the same columns in the same order, the database returns rows pre-sorted and skips an extra sort step.
-- Query: filter by customer, sort by created_at
-- Ideal index: (customer_id, created_at) — serves both filter and ORDER BY
SELECT * FROM orders
WHERE customer_id = 42
ORDER BY created_at DESC;

Quick interview checklist

  • State the left-prefix rule in one sentence — most interviewers stop here.
  • Know that EXPLAIN shows Index Cond (index used) vs. Filter (heap-level filter after index scan) — the latter means the predicate didn’t qualify as an index condition.
  • Remember that ORDER BY can also benefit from a composite index when the sort columns form a left prefix and the sort directions match.

Keep practising

All SQL questions

Explore further

Skip to content