datarekha
SQL Medium Asked at AmazonAsked at Meta

When would you use a self-join, and how do you write one?

The short answer

A self-join joins a table to itself, typically to compare rows within the same dataset — classic use cases are finding employee-manager relationships in a single table, detecting duplicate rows, or comparing a row to the previous/next row when window functions are unavailable.

How to think about it

A self-join is not a special join type — it is an ordinary join where both sides reference the same table via different aliases. The whole trick is understanding when it is the natural tool.

What is really being tested

The question probes whether you can model intra-table relationships. The classic sign you need a self-join is when rows in the same table relate to each other — a hierarchy, a pair, or a sequence. Once you spot that pattern, the implementation is straightforward.

Classic use case: employee–manager hierarchy

The most common interview scenario. One table holds every person, and a column (manager_id) points back to another row in the same table.

SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Use LEFT JOIN — not INNER JOIN — so the top of the hierarchy (the CEO with no manager) still appears in the result rather than being silently dropped.

Finding nearby rows: orders within 7 days

A self-join also works when you need to compare every row to every other row that meets some condition — like finding repeat purchases by the same customer within a week.

SELECT a.order_id AS order_a,
       b.order_id AS order_b,
       a.customer_id,
       a.order_date AS date_a,
       b.order_date AS date_b
FROM orders a
JOIN orders b
  ON a.customer_id = b.customer_id
 AND b.order_date BETWEEN a.order_date AND a.order_date + 7
 AND b.order_id > a.order_id;

The b.order_id > a.order_id guard prevents seeing the same pair twice (once as A,B and once as B,A) and removes the trivial self-match (a row paired with itself). Forgetting this is the most common self-join mistake.

Detecting exact duplicates

SELECT a.id, b.id AS duplicate_id
FROM products a
JOIN products b
  ON a.sku = b.sku
 AND a.id < b.id;

When a window function beats a self-join

If you need a value from the previous row — say, yesterday’s revenue for a day-over-day comparison — LAG() is both cleaner and faster than a self-join on date. Reach for self-join when the relationship is structural (a hierarchy, an adjacency) rather than purely sequential.

Learn it properly INNER JOIN

Keep practising

All SQL questions

Explore further

Skip to content