SQL Easy Asked at AmazonAsked at Meta

Why does SQL require every non-aggregated SELECT column to appear in GROUP BY?

For Data Analyst Data Scientist Data Engineer

The short answer

Because after grouping, multiple source rows collapse into one output row. Any column not in the GROUP BY key could have different values across those collapsed rows, making a single deterministic output value impossible without an aggregate function.

How to think about it

This is a “why does it work this way?” question, and the interviewer is checking whether you actually picture what GROUP BY does to rows — or whether you have only memorised the error message. Once you see a single concrete group, the rule stops feeling like a rule at all.

One group, many values

Take a tiny orders table:

order_id	customer_id	city	amount
1	42	London	100
2	42	Paris	200

Now write GROUP BY customer_id. Both rows belong to customer 42, so they collapse into a single output row. But which city should that one row show — London or Paris? There is no honest answer: the two source rows disagree, and the database refuses to invent one for you. That is the whole reason every non-aggregated column you SELECT must also sit in the GROUP BY — it has to help define what makes a group unique, or it has no single value to report.

Put city into the grouping key and the ambiguity disappears, because now each distinct (customer_id, city) pair is its own group:

CREATE TABLE orders (order_id INTEGER, customer_id INTEGER, city TEXT, amount INTEGER);
INSERT INTO orders VALUES
  (1, 42, 'London', 100), (2, 42, 'Paris', 200),
  (3, 55, 'Tokyo', 300),  (4, 55, 'Tokyo', 150),
  (5, 99, 'NYC',   500);

SELECT customer_id, city, SUM(amount) AS total
FROM orders
GROUP BY customer_id, city;

customer_id	city	total
42	London	100
42	Paris	200
55	Tokyo	450
99	NYC	500

Customer 55’s two Tokyo orders share a group and sum to 450; customer 42’s two cities stay split, because city now helps define the group.

What different engines do

Standards-compliant engines — PostgreSQL, BigQuery, Snowflake, SQL Server — reject the ambiguous query outright: column "city" must appear in the GROUP BY clause or be used in an aggregate function.

MySQL, in its old non-strict mode, did something far more dangerous: it silently picked an arbitrary city from the group and returned it as if it meant something. The query “worked” and the data was quietly wrong. Since 5.7, MySQL ships with ONLY_FULL_GROUP_BY enabled by default, which restores the correct, standards-compliant error.

The one principled exception

PostgreSQL allows a relaxation worth knowing, because it reveals the real rule underneath. If you GROUP BY a table’s primary key, you may select that table’s other columns without listing them:

SELECT c.customer_id, c.name, SUM(o.amount) AS total
FROM customers c
JOIN orders o USING (customer_id)
GROUP BY c.customer_id;        -- c.name is safe: the PK determines it

There is no ambiguity here. A primary key functionally determines every other column in its row, so name can have only one value per group. The rule was never really “list every column” — it was always “every selected column must have exactly one value per group.” A primary key guarantees that on its own.

Learn it properly Aggregates & GROUP BY

Why does SQL require every non-aggregated SELECT column to appear in GROUP BY?

One group, many values

What different engines do

The one principled exception

Keep practising

Explore further