datarekha
SQL Medium Asked at AirbnbAsked at StripeAsked at Lyft

How can aggregating after a JOIN produce inflated (double-counted) totals, and how do you fix it?

The short answer

A JOIN that fans out rows — one-to-many or many-to-many — causes the same source row to appear multiple times in the joined result set. Aggregating on that inflated set multiplies values, giving totals larger than the true sum.

How to think about it

The interviewer is checking whether you understand why a query that looks correct can silently return wrong numbers. This is one of the most common bugs in analytics SQL and it’s invisible — no error, no warning, just a sum that’s 3x too large.

The trap: fanning out rows before you aggregate

Say each order can carry multiple tags (promotions, campaigns, categories). You want total revenue per customer plus a tag count. The intuitive query joins first, then aggregates:

-- orders: order_id, customer_id, revenue
-- order_tags: order_id, tag

SELECT
    o.customer_id,
    SUM(o.revenue)  AS total_revenue,  -- WRONG if an order has 3 tags: revenue counted 3x
    COUNT(t.tag)    AS tag_count
FROM orders o
JOIN order_tags t ON o.order_id = t.order_id
GROUP BY o.customer_id;

If order #1 has revenue 100 and 3 tags, the JOIN produces 3 rows for it. SUM(revenue) adds 100 three times = 300. The query returned a number, no error — that’s what makes this dangerous.

The fix: aggregate before joining

Collapse the many-side table down to one row per join key before the join happens:

Why the naïve fix (DISTINCT) is fragile

SUM(DISTINCT o.revenue)   -- dangerous: collapses equal amounts, not equal orders
COUNT(DISTINCT o.order_id) -- safe for counts only

SUM(DISTINCT ...) deduplicates by value, so two legitimate orders of $100 each collapse to $100 total. It solves the symptom by breaking the semantics. Always prefer pre-aggregating upstream.

The key insight

The moment a JOIN multiplies rows on the driving table’s side, any aggregate on that side becomes wrong. The safe rule: check row counts before you aggregate. If the joined result has more rows than your driving table, you have a fan-out and need to aggregate the many-side first.

Learn it properly Aggregates & GROUP BY

Keep practising

All SQL questions

Explore further

Skip to content