How do you delete duplicate rows from a table using ROW_NUMBER, keeping only one copy per duplicate group?
Assign ROW_NUMBER() partitioned by the columns that define a duplicate and ordered by a tiebreaker (e.g., primary key or created_at). Any row where the row number exceeds 1 is a duplicate — delete those rows via a CTE or subquery referencing the physical row identifier.
How to think about it
The approach has two steps that mirror each other: first identify duplicates (see which rows get rn > 1), then act on them. Always do step one first — confirm you’re marking the right rows before you delete anything.
Step 1 — label every row
PARTITION BY defines what makes two rows “the same record.” ORDER BY inside the window decides which copy to keep (lowest id = earliest insert, highest = most recent).
SELECT
id,
email,
ROW_NUMBER() OVER (
PARTITION BY email -- "same" means same email
ORDER BY id ASC -- keep the earliest row
) AS rn
FROM users;
Any row with rn = 1 is the keeper. Any row with rn > 1 is a duplicate.
Try it — see which rows would be deleted
Step 2 — delete in PostgreSQL (CTE delete)
WITH dupes AS (
SELECT id,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY id ASC
) AS rn
FROM users
)
DELETE FROM users
WHERE id IN (
SELECT id FROM dupes WHERE rn > 1
);
Step 2 — delete in MySQL (subquery workaround)
MySQL prevents referencing the target table directly in a subquery, so you need an extra wrapping level:
DELETE FROM users
WHERE id IN (
SELECT id FROM (
SELECT id,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY id ASC
) AS rn
FROM users
) t
WHERE t.rn > 1
);
Safer alternative for large tables — create a clean copy
Rather than deleting in-place on millions of rows, copy the keepers to a new table and swap:
CREATE TABLE users_clean AS
SELECT id, email, created_at
FROM (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY id ASC
) AS rn
FROM users
) t
WHERE rn = 1;
-- Then verify row counts, rename tables, drop the old one