SQL Medium Asked at AmazonAsked at SnowflakeAsked at Databricks

What are the risks of placing a correlated subquery in the SELECT list, and what is the preferred rewrite?

For Data Analyst Data Scientist Data Engineer

The short answer

A correlated subquery in the SELECT list executes once per output row, turning what looks like a simple projection into an O(n) nested loop. The preferred rewrites are a window function or a pre-aggregating JOIN, both of which the optimizer can execute in a single pass.

How to think about it

A correlated subquery in the SELECT list is tempting because it reads so naturally — “for each employee, give me their department’s average salary.” The problem is the phrase for each: that’s exactly what the database does. It re-runs the inner query once per output row.

A subquery is correlated when it references a column from the outer query, so the engine can’t pre-compute it once — it must re-evaluate row by row:

SELECT e.id, e.name, e.salary,
       (SELECT AVG(salary) FROM employees e2 WHERE e2.dept_id = e.dept_id) AS dept_avg,
       (SELECT MAX(salary) FROM employees e2 WHERE e2.dept_id = e.dept_id) AS dept_max
FROM employees e;

With 100,000 employees across 50 departments, that runs the aggregation 100,000 times instead of the 50 actually needed — a 2,000× redundancy.

A worked example — the window-function rewrite

A window function computes the aggregate across each partition in a single pass and attaches it to every row — same answer, no nested loop:

SELECT id, name, salary,
       ROUND(AVG(salary) OVER (PARTITION BY dept_id)) AS dept_avg,
       MAX(salary)       OVER (PARTITION BY dept_id)  AS dept_max
FROM employees
ORDER BY id;

id	name	salary	dept_avg	dept_max
1	Aarav	120000	108333.0	120000
2	Bea	110000	108333.0	120000
3	Chen	95000	108333.0	120000
4	Dara	80000	70667.0	80000
5	Eli	72000	70667.0	80000
6	Farah	60000	70667.0	80000

Each department’s dept_avg and dept_max repeat down its rows (108333/120000 for dept 1, 70667/80000 for dept 2), computed once per partition rather than once per row. When you also need the aggregates for filtering, a pre-aggregated CTE joined back is the other clean rewrite:

WITH dept_stats AS (
  SELECT dept_id, AVG(salary) AS dept_avg, MAX(salary) AS dept_max
  FROM employees GROUP BY dept_id
)
SELECT e.id, e.name, e.salary, ds.dept_avg, ds.dept_max
FROM employees e
JOIN dept_stats ds ON e.dept_id = ds.dept_id;

The CTE aggregates once per department; the join is one hash pass.

Learn it properly Subqueries

What are the risks of placing a correlated subquery in the SELECT list, and what is the preferred rewrite?

A worked example — the window-function rewrite

Keep practising

Explore further