SQL Medium Asked at AmazonAsked at Microsoft

How do you delete duplicate rows from a table using ROW_NUMBER, keeping only one copy per duplicate group?

The short answer

Assign ROW_NUMBER() partitioned by the columns that define a duplicate and ordered by a tiebreaker (e.g., primary key or created_at). Any row where the row number exceeds 1 is a duplicate — delete those rows via a CTE or subquery referencing the physical row identifier.

How to think about it

The approach is two mirrored steps: first identify duplicates, then act on them. Always do the labelling step first — confirm you’re marking the right rows before anything is deleted, because a DELETE you got wrong is expensive to undo.

PARTITION BY defines what makes two rows “the same record”; the window’s ORDER BY decides which copy to keep (lowest id = earliest, highest = most recent).

Step 1 — label every row

SELECT id, email, created_at,
       ROW_NUMBER() OVER (
         PARTITION BY email      -- "same" means same email
         ORDER BY id ASC         -- keep the earliest row
       ) AS rn
FROM users
ORDER BY email, id;

id	email	created_at	rn
1	alice@example.com	2024-01-01	1
3	alice@example.com	2024-01-05	2
6	alice@example.com	2024-01-08	3
2	bob@example.com	2024-01-02	1
5	bob@example.com	2024-01-07	2
4	carol@example.com	2024-01-06	1

Each email’s window restarts the count: rn = 1 is the keeper (the earliest id), and every rn > 1 is a duplicate to remove — here ids 3, 6, and 5. Flip the window to ORDER BY id DESC and you’d keep the latest instead. The labelling makes the deletion target unambiguous before you touch the data.

Step 2 — delete the marked rows

In PostgreSQL you can DELETE straight from a CTE:

WITH dupes AS (
  SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id ASC) AS rn
  FROM users
)
DELETE FROM users WHERE id IN (SELECT id FROM dupes WHERE rn > 1);

MySQL forbids referencing the target table directly in the subquery, so wrap it in one more level to force materialisation:

DELETE FROM users
WHERE id IN (
  SELECT id FROM (
    SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id ASC) AS rn
    FROM users
  ) t
  WHERE t.rn > 1
);

For a multi-million-row table, prefer copying the keepers into a fresh table and swapping, rather than a giant in-place DELETE: CREATE TABLE users_clean AS SELECT ... WHERE rn = 1, verify the counts, then rename.

Learn it properly Deduplication

How do you delete duplicate rows from a table using ROW_NUMBER, keeping only one copy per duplicate group?

Step 1 — label every row

Step 2 — delete the marked rows

Keep practising

Explore further