SQL Easy Asked at AmazonAsked at Microsoft

What is the difference between DISTINCT and GROUP BY for deduplication?

For Data Analyst Data Engineer Data Scientist

The short answer

Both eliminate duplicate rows, but GROUP BY is the right choice when you also want aggregate values per group. DISTINCT is cleaner when you only need unique rows with no aggregation. In practice most optimizers produce identical plans for simple cases, but semantics and intent differ.

How to think about it

This checks whether you see that DISTINCT and GROUP BY solve slightly different problems: one is purely deduplication, the other groups in order to aggregate. Knowing which to reach for shows you understand SQL’s logical execution model.

When they’re interchangeable

For plain deduplication with no calculation, they return the same rows — and DISTINCT is the cleaner choice because it states the intent at a glance:

SELECT DISTINCT dept FROM employees;
-- identical result to:  SELECT dept FROM employees GROUP BY dept;

dept
Eng
Sales
HR

Three departments, deduplicated. Either form gets here; DISTINCT dept just reads as “give me the unique departments,” which is exactly what you meant.

A worked example — when only GROUP BY works

The moment you want an aggregate alongside the unique values — a count, a sum, an average — only GROUP BY has the syntax for it:

SELECT dept,
       COUNT(*)               AS headcount,
       ROUND(AVG(salary), 2)  AS avg_pay
FROM employees
GROUP BY dept
ORDER BY headcount DESC;

dept	headcount	avg_pay
Eng	3	111666.67
Sales	2	76000.0
HR	1	60000.0

DISTINCT could give you the three department names, but it has no way to attach COUNT(*) or AVG(salary) to them. The instant the question becomes “…and how many / how much per group?”, you’re in GROUP BY territory.

Performance

On every modern optimizer — PostgreSQL, MySQL 8+, SQL Server, Snowflake — SELECT DISTINCT col and GROUP BY col produce the same plan for the simple case. So don’t pick on a performance hunch; pick on intent and readability: DISTINCT for “unique rows,” GROUP BY for “grouped, to aggregate.”

Learn it properly Deduplication

What is the difference between DISTINCT and GROUP BY for deduplication?

When they’re interchangeable

A worked example — when only GROUP BY works

Performance

Keep practising

Explore further