datarekha
SQL Medium Asked at AmazonAsked at Microsoft

How do you delete duplicate rows from a table using ROW_NUMBER, keeping only one copy per duplicate group?

The short answer

Assign ROW_NUMBER() partitioned by the columns that define a duplicate and ordered by a tiebreaker (e.g., primary key or created_at). Any row where the row number exceeds 1 is a duplicate — delete those rows via a CTE or subquery referencing the physical row identifier.

How to think about it

The approach has two steps that mirror each other: first identify duplicates (see which rows get rn > 1), then act on them. Always do step one first — confirm you’re marking the right rows before you delete anything.

Step 1 — label every row

PARTITION BY defines what makes two rows “the same record.” ORDER BY inside the window decides which copy to keep (lowest id = earliest insert, highest = most recent).

SELECT
    id,
    email,
    ROW_NUMBER() OVER (
        PARTITION BY email       -- "same" means same email
        ORDER BY id ASC          -- keep the earliest row
    ) AS rn
FROM users;

Any row with rn = 1 is the keeper. Any row with rn > 1 is a duplicate.

Try it — see which rows would be deleted

Step 2 — delete in PostgreSQL (CTE delete)

WITH dupes AS (
    SELECT id,
        ROW_NUMBER() OVER (
            PARTITION BY email
            ORDER BY id ASC
        ) AS rn
    FROM users
)
DELETE FROM users
WHERE id IN (
    SELECT id FROM dupes WHERE rn > 1
);

Step 2 — delete in MySQL (subquery workaround)

MySQL prevents referencing the target table directly in a subquery, so you need an extra wrapping level:

DELETE FROM users
WHERE id IN (
    SELECT id FROM (
        SELECT id,
            ROW_NUMBER() OVER (
                PARTITION BY email
                ORDER BY id ASC
            ) AS rn
        FROM users
    ) t
    WHERE t.rn > 1
);

Safer alternative for large tables — create a clean copy

Rather than deleting in-place on millions of rows, copy the keepers to a new table and swap:

CREATE TABLE users_clean AS
SELECT id, email, created_at
FROM (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY email
            ORDER BY id ASC
        ) AS rn
    FROM users
) t
WHERE rn = 1;
-- Then verify row counts, rename tables, drop the old one
Learn it properly Deduplication

Keep practising

All SQL questions

Explore further

Skip to content