SQL Medium Asked at AmazonAsked at UberAsked at DoorDash

Given a query that filters on both a raw column and an aggregate result, how do you structure it for correctness and performance?

For Data Analyst Data Scientist Data Engineer

The short answer

Raw-column filters belong in WHERE so the engine scans fewer rows before grouping. Aggregate filters must go in HAVING. Applying a filter in HAVING that could have been in WHERE forces the engine to aggregate more rows than necessary.

How to think about it

This is testing whether you know SQL’s logical execution order and can use it to write queries that are both correct and fast. The whole idea: filter as early as you can, and use HAVING only for what WHERE literally cannot do.

SQL evaluates its clauses in a fixed order:

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY

WHERE runs on individual rows before grouping; HAVING runs on groups after aggregation. So:

raw-column conditions (country = 'US', status = 'active') belong in WHERE — they shrink the row count before the expensive GROUP BY;
aggregate conditions (COUNT(*) > 10) must go in HAVING — the aggregate doesn’t even exist yet at WHERE time.

A worked example

The task: among active US orders, find customers with more than 10 orders and total spend over $500. Raw filters in WHERE, aggregate filters in HAVING:

SELECT customer_id,
       COUNT(*)    AS order_count,
       SUM(amount) AS total_spend
FROM orders
WHERE country = 'US' AND status = 'active'   -- per-row filters, applied first
GROUP BY customer_id
HAVING COUNT(*) > 10 AND SUM(amount) > 500;   -- per-group filters, applied after

customer_id	order_count	total_spend
1	11	2820

Only customer 1 survives: 11 active US orders totalling 2,820. Customer 2 (UK) and customer 3 (inactive) were filtered out by WHERE before grouping; customer 4 made it into the grouping but failed HAVING COUNT(*) > 10. The division of labour is the point — WHERE thinned the rows, HAVING judged the groups.

Why the placement matters

Move country = 'US' into HAVING and you’d get the same result — but the engine would now aggregate orders across every country, year, and status, then discard most of them after the fact. On a table with millions of rows, that’s the difference between scanning a slice and scanning the whole thing.

Learn it properly WHERE & filtering

Given a query that filters on both a raw column and an aggregate result, how do you structure it for correctness and performance?

A worked example

Why the placement matters

Keep practising

Explore further