What are the risks of placing a correlated subquery in the SELECT list, and what is the preferred rewrite?
A correlated subquery in the SELECT list executes once per output row, turning what looks like a simple projection into an O(n) nested loop. The preferred rewrites are a window function or a pre-aggregating JOIN, both of which the optimizer can execute in a single pass.
How to think about it
Placing a correlated subquery in the SELECT list is tempting because it reads naturally — “for each employee, give me their department’s average salary.” The problem is the word for each: that is exactly what the database does. It re-runs the inner query once per output row.
What makes a subquery “correlated”
A subquery is correlated when it references a column from the outer query. The database cannot pre-compute it once; it must re-evaluate it row by row.
-- Correlated subquery in SELECT: runs once per employee row
SELECT
e.employee_id,
e.name,
e.salary,
(SELECT AVG(salary)
FROM employees e2
WHERE e2.department_id = e.department_id) AS dept_avg,
(SELECT MAX(salary)
FROM employees e2
WHERE e2.department_id = e.department_id) AS dept_max
FROM employees e;
With 100 000 employees across 50 departments the database executes the aggregation 100 000 times instead of the 50 times actually needed — a 2 000x redundancy.
Rewrite 1: window function (the clean path)
Window functions compute the aggregate across the partition in a single pass and attach the result to each row automatically — no loop, no redundancy.
Rewrite 2: JOIN to a pre-aggregated CTE
Useful when you need the aggregates for filtering too, or when the CTE result is reused elsewhere in the query.
WITH dept_stats AS (
SELECT
department_id,
AVG(salary) AS dept_avg,
MAX(salary) AS dept_max
FROM employees
GROUP BY department_id
)
SELECT
e.employee_id,
e.name,
e.salary,
ds.dept_avg,
ds.dept_max
FROM employees e
JOIN dept_stats ds ON e.department_id = ds.department_id;
The CTE aggregates once per department; the join is a single hash-join pass. Much better than the nested-loop version.
When a correlated SELECT-list subquery is acceptable
A correlated scalar subquery is fine when:
- The inner table is small and likely cached.
- There is no clean window or join equivalent (e.g. “the most recent status event for each order” from a separate events table that has no foreign key relationship the optimizer can exploit).
Even then, verify with EXPLAIN that the engine is not executing it as a nested loop.