What is the difference between DISTINCT and GROUP BY for deduplication?
Both eliminate duplicate rows, but GROUP BY is the right choice when you also want aggregate values per group. DISTINCT is cleaner when you only need unique rows with no aggregation. In practice most optimizers produce identical plans for simple cases, but semantics and intent differ.
How to think about it
This question checks whether you understand that DISTINCT and GROUP BY solve slightly different problems — one is purely for deduplication, the other is for grouping before aggregating. Knowing when to reach for each shows you understand SQL’s logical execution model.
When they produce identical results
For simple deduplication with no calculations, they return the same rows. In that case, DISTINCT is the cleaner choice because it makes the intent obvious at a glance.
When only GROUP BY works
The moment you want an aggregate alongside the unique values — counts, sums, averages — you need GROUP BY. DISTINCT has no syntax for it.
Multi-column deduplication
Both operate on the full column list you hand them. DISTINCT city, country gives unique city-country pairs — not unique cities.
Performance
On modern optimizers — PostgreSQL, MySQL 8+, SQL Server, Snowflake — the execution plan for SELECT DISTINCT col FROM t is typically identical to GROUP BY col. Do not make this choice based on a performance hunch. Make it based on intent and readability.