datarekha
Patterns June 2, 2026

CTEs turned my unreadable query into something I can follow

A common table expression is less a feature than a cognitive tool — it lets you read a complex query from top to bottom, in the order you actually think, instead of inside-out from the deepest parenthesis.

8 min read · by datarekha · sqlctesreadabilitydata-engineeringanalytics

There is a query I wrote in 2021 that I am mildly ashamed of. It was 47 lines. It produced the right answer. I know because I tested it carefully and documented it in a comment at the top. I also know that six months later, when a colleague asked me to add a filter to it, I had to spend twenty minutes re-reading it before I trusted myself to touch it. The query was correct. The query was also nearly unreadable — not because the logic was hard, but because I had written it in the wrong order.

SQL is a declarative language. You describe what you want, and the engine figures out how to get it. But “declarative” does not mean “sequential.” Subqueries nest inside each other. The innermost subquery executes conceptually first, but syntactically it sits deepest in the parentheses, buried in the middle of the file. To read a deeply nested query, you hunt for the innermost SELECT, understand it, hold it in working memory, climb one level, understand the next SELECT in terms of the one you are holding, and repeat. You read inside-out. You think left-to-right. The mismatch is the problem.

A CTE — common table expression, the WITH name AS (...) syntax — is a solution to that mismatch. It lets you name an intermediate result and reference it by name later. That is the entire feature. The insight is in what it unlocks: you can now write the query in the same order you would explain it to someone.

The nested version, honestly

Imagine you want to find, for each sales region, the three products whose revenue growth from Q4 2024 to Q1 2025 exceeded the region’s own average growth. That is a three-step computation: (1) compute per-product, per-region revenue for each quarter, (2) compute per-product growth rates, (3) filter products whose growth beats their region’s average. Straightforward when stated as three steps. Here is what it looks like when written as nested subqueries:

SELECT
    region,
    product_id,
    growth_pct
FROM (
    SELECT
        region,
        product_id,
        growth_pct,
        AVG(growth_pct) OVER (PARTITION BY region) AS region_avg_growth
    FROM (
        SELECT
            region,
            product_id,
            (q1_revenue - q4_revenue) / NULLIF(q4_revenue, 0) * 100 AS growth_pct
        FROM (
            SELECT
                region,
                product_id,
                SUM(CASE WHEN quarter = 'Q4-2024' THEN revenue ELSE 0 END) AS q4_revenue,
                SUM(CASE WHEN quarter = 'Q1-2025' THEN revenue ELSE 0 END) AS q1_revenue
            FROM sales
            GROUP BY region, product_id
        ) AS quarterly_totals
    ) AS growth_rates
) AS annotated
WHERE growth_pct > region_avg_growth
ORDER BY region, growth_pct DESC;

The logic is correct. The layers are: quarterly_totals at the bottom, growth_rates in the middle, annotated outside that, and the final filter at the top. To understand this query, you must find quarterly_totals, read its eight lines, hold its output schema in your head, scroll outward to growth_rates, understand it in terms of quarterly_totals, scroll outward again, understand annotated, and finally arrive at the WHERE clause. Four context-switches. Four opportunities to lose the thread.

Nested subqueriesCTEsSELECT … WHERE growth_pct≥ region_avg_growthannotated — AVG OVER(PARTITION BY region)growth_rates — compute(q1 − q4) / q4 × 100quarterly_totalsSUM + GROUP BYRead inside-out →quarterly_totalsSUM + GROUP BYgrowth_rates(q1 − q4) / q4 × 100annotatedAVG OVER (PARTITION BY region)final filterWHERE growth_pct ≥ region_avgRead top-to-bottom →
Left: nested subqueries force inside-out reading. Right: the same logic as three sequential CTEs reads in the order you reason through it.

The same logic, flipped right-side-up

WITH quarterly_totals AS (
    SELECT
        region,
        product_id,
        SUM(CASE WHEN quarter = 'Q4-2024' THEN revenue ELSE 0 END) AS q4_revenue,
        SUM(CASE WHEN quarter = 'Q1-2025' THEN revenue ELSE 0 END) AS q1_revenue
    FROM sales
    GROUP BY region, product_id
),

growth_rates AS (
    SELECT
        region,
        product_id,
        (q1_revenue - q4_revenue) / NULLIF(q4_revenue, 0) * 100 AS growth_pct
    FROM quarterly_totals
),

annotated AS (
    SELECT
        region,
        product_id,
        growth_pct,
        AVG(growth_pct) OVER (PARTITION BY region) AS region_avg_growth
    FROM growth_rates
)

SELECT region, product_id, growth_pct
FROM annotated
WHERE growth_pct > region_avg_growth
ORDER BY region, growth_pct DESC;

The identical computation. The identical result. But now you read it in the order you would narrate it to yourself: first we aggregate revenue by quarter, then we compute growth rates, then we annotate each product with its region’s average, then we filter and display. Each CTE is a paragraph with a topic sentence. The name carries semantic weight — growth_rates tells you what the block produces before you read a single line of it.

This is not about making things simpler. The logic is the same complexity. What changes is the direction of reading.

Why naming beats nesting

There is a principle in software design sometimes called “make the implicit explicit.” Nested subqueries have implicit intermediate results — they are computed but anonymous. You cannot refer to a nested subquery by name anywhere else in the query because it has no name. A CTE makes that intermediate result explicit and reachable.

Reachability is underrated. Suppose quarterly_totals feeds both growth_rates and a second branch of analysis — say, an absolute revenue ranking computed in parallel. In the nested version, you would have to either duplicate the quarterly_totals logic or push it into a temporary table. With CTEs you write it once, name it once, and reference it from as many subsequent CTEs as you need.

WITH quarterly_totals AS (
    -- ...same as before...
),

growth_rates AS (
    SELECT ... FROM quarterly_totals
),

revenue_ranks AS (
    SELECT
        region,
        product_id,
        q1_revenue,
        RANK() OVER (PARTITION BY region ORDER BY q1_revenue DESC) AS revenue_rank
    FROM quarterly_totals
)

SELECT g.region, g.product_id, g.growth_pct, r.revenue_rank
FROM growth_rates g
JOIN revenue_ranks r
    ON g.region = r.region AND g.product_id = r.product_id
WHERE g.growth_pct > 10;

quarterly_totals does its work once and is referenced twice without duplication. In the nested world this is a copy-paste trap waiting to happen.

The performance question you should ask

The honest answer is: it depends on the engine, and for most practical queries it does not matter.

Every major SQL engine — PostgreSQL, BigQuery, Snowflake, DuckDB, SQL Server — treats a non-recursive CTE as syntactic sugar. At parse time, the planner sees a named result and decides whether to materialize it (compute it once and store it temporarily) or inline it (substitute the CTE body wherever the name is referenced, as if you had written the subquery there). Modern planners default to inlining, which means the execution plan for the CTE version and the nested version is often identical.

PostgreSQL prior to version 12 was the famous exception: CTEs were always materialized, acting as optimization fences that could prevent certain query-level optimizations. Post version 12, the planner inlines by default unless you explicitly ask for materialization with MATERIALIZATION. Snowflake and BigQuery inline aggressively. DuckDB’s optimizer is aware of CTE structure. If you are on a very old PostgreSQL or a proprietary engine with unusual behavior, check the EXPLAIN plan. But in 2026, treating CTEs as a performance concern is the wrong default assumption.

The one case where materialization genuinely helps: when the same CTE is referenced many times and the underlying computation is expensive. In that case, explicit WITH MATERIALIZED AS (...) (PostgreSQL syntax) or a temporary table can prevent redundant computation. That is an optimization you apply deliberately, not one you stumble into.

The recursive cousin

CTEs have a second form — the recursive CTE — that solves a completely different class of problem. A recursive CTE, declared with WITH RECURSIVE, can reference itself, which lets you traverse graphs and trees inside pure SQL: org charts, bill-of-materials hierarchies, friend-of-friend networks. It is a powerful feature with its own intuition. But it is a separate topic. Conflating the two is why some developers assume CTEs are “advanced” — the non-recursive form is not. It is a naming facility. The recursive form is graph traversal. Do not let the label confuse the concept.

What the WITH clause is really doing to your brain

Cognitive scientists who study programming — there are a few, mostly ignored — have found that the bottleneck in reading code is working memory: the mental scratchpad you use to track open context while reading. Nested expressions multiply the demand on working memory because each unresolved layer occupies a slot. You cannot finish understanding the outer SELECT until you have finished understanding the inner one, but you have to hold the outer SELECT open while you do.

CTEs serialize that process. Each CTE is a closed unit. You read it, understand it, close the mental file, and open the next one. By the time you reach the final SELECT, every name it references is already resolved and stored somewhere outside your working memory — in the query text itself, readable by scrolling up. The cognitive load is lower not because the logic is simpler but because the structure externalizes intermediate state.

This is the same reason engineers decompose long functions into named helper functions: not because the total number of instructions decreases, but because named decomposition turns a single large working-memory problem into a sequence of small ones.

WORKING MEMORY DEMANDNested subqueries — all layers open simultaneouslyouter SELECT (open)annotated (open)growth_rates (open)quarterly_totals — read firstCTEs — one resolved unit at a timequarterly_totals ✓growth_rates ✓annotated ✓final SELECT (current)Each checked block is resolved and stored in the query text — not in your head.
Nested subqueries force you to hold every unresolved layer open in working memory at once. CTEs let you close each layer before opening the next.

The habit worth building

Reach for a CTE the moment a query has more than one logical step. Not because nesting is wrong, not because CTEs are faster, but because a query you can read under pressure is a query you can trust. Six months from now, a different engineer — or a tired version of you at 4 p.m. — will modify that query. The version that names its intermediate results will survive the edit. The nested version will accumulate fear.

One practical signal: if you have ever written a SQL comment that says “this subquery computes X,” that is a CTE asking to be named. The comment is a workaround for the absence of a name. Replace the comment with the thing it describes.

The pattern is almost embarrassingly simple. You already know how to write the subquery. WITH quarterly_totals AS ( goes at the top, your subquery goes in the parentheses, and you never write the subquery inline again. The logic is unchanged. The query becomes something you can follow.

That 47-line query I mentioned at the start? I rewrote it with CTEs. It became 52 lines. Five lines longer, because the WITH keywords and CTE names take space. My colleague added the filter in under two minutes. Correctness had not changed. Maintainability had.

Lines of SQL are not the cost. Understanding is the cost.

Skip to content