datarekha
SQL Medium Asked at AmazonAsked at SnowflakeAsked at Databricks

What are the risks of placing a correlated subquery in the SELECT list, and what is the preferred rewrite?

The short answer

A correlated subquery in the SELECT list executes once per output row, turning what looks like a simple projection into an O(n) nested loop. The preferred rewrites are a window function or a pre-aggregating JOIN, both of which the optimizer can execute in a single pass.

How to think about it

Placing a correlated subquery in the SELECT list is tempting because it reads naturally — “for each employee, give me their department’s average salary.” The problem is the word for each: that is exactly what the database does. It re-runs the inner query once per output row.

What makes a subquery “correlated”

A subquery is correlated when it references a column from the outer query. The database cannot pre-compute it once; it must re-evaluate it row by row.

-- Correlated subquery in SELECT: runs once per employee row
SELECT
    e.employee_id,
    e.name,
    e.salary,
    (SELECT AVG(salary)
     FROM employees e2
     WHERE e2.department_id = e.department_id) AS dept_avg,
    (SELECT MAX(salary)
     FROM employees e2
     WHERE e2.department_id = e.department_id) AS dept_max
FROM employees e;

With 100 000 employees across 50 departments the database executes the aggregation 100 000 times instead of the 50 times actually needed — a 2 000x redundancy.

Rewrite 1: window function (the clean path)

Window functions compute the aggregate across the partition in a single pass and attach the result to each row automatically — no loop, no redundancy.

Rewrite 2: JOIN to a pre-aggregated CTE

Useful when you need the aggregates for filtering too, or when the CTE result is reused elsewhere in the query.

WITH dept_stats AS (
    SELECT
        department_id,
        AVG(salary) AS dept_avg,
        MAX(salary) AS dept_max
    FROM employees
    GROUP BY department_id
)
SELECT
    e.employee_id,
    e.name,
    e.salary,
    ds.dept_avg,
    ds.dept_max
FROM employees e
JOIN dept_stats ds ON e.department_id = ds.department_id;

The CTE aggregates once per department; the join is a single hash-join pass. Much better than the nested-loop version.

When a correlated SELECT-list subquery is acceptable

A correlated scalar subquery is fine when:

  • The inner table is small and likely cached.
  • There is no clean window or join equivalent (e.g. “the most recent status event for each order” from a separate events table that has no foreign key relationship the optimizer can exploit).

Even then, verify with EXPLAIN that the engine is not executing it as a nested loop.

Learn it properly Subqueries

Keep practising

All SQL questions

Explore further

Skip to content