Why does SQL require every non-aggregated SELECT column to appear in GROUP BY?
Because after grouping, multiple source rows collapse into one output row. Any column not in the GROUP BY key could have different values across those collapsed rows, making a single deterministic output value impossible without an aggregate function.
How to think about it
This is a “why does it work this way” question. The rule isn’t arbitrary — it follows directly from what GROUP BY does to rows. Once you see the concrete example, it clicks immediately.
The core problem: one group, many values
Imagine an orders table:
| order_id | customer_id | city | amount |
|---|---|---|---|
| 1 | 42 | London | 100 |
| 2 | 42 | Paris | 200 |
If you write GROUP BY customer_id, both rows collapse into one output row for customer 42. But city has two different values — London and Paris — for that single group. There is no deterministic answer to “what city should appear in the output?”, so the SQL standard forbids it.
See the rule in action
What different engines do
PostgreSQL, BigQuery, Snowflake, SQL Server — raise an error: column "city" must appear in the GROUP BY clause or be used in an aggregate function. This is the correct, standards-compliant behavior.
MySQL in non-strict mode — silently picks an arbitrary row value for city. This looks like it works but produces wrong data. MySQL’s ONLY_FULL_GROUP_BY mode (the default since MySQL 5.7) disables this behavior and makes MySQL behave correctly.
The functional dependency exception (PostgreSQL only)
PostgreSQL has one smart relaxation: if the GROUP BY column is a primary key of a table, PostgreSQL allows other columns from that same table to appear in SELECT without being in GROUP BY. The PK functionally determines every other column, so there is no ambiguity.
-- Allowed in PostgreSQL when customer_id is PK of the customers table:
SELECT c.customer_id, c.name, SUM(o.amount)
FROM customers c
JOIN orders o USING (customer_id)
GROUP BY c.customer_id; -- c.name is safe because PK determines it