datarekha

SQL: Computing Results by Hand

Read a SQL query and predict its row count — the single skill GATE drills with the SELECT/JOIN/GROUP BY toolkit.

9 min read Intermediate GATE DA Lesson 68 of 122

What you'll learn

  • SELECT/FROM/WHERE/JOIN/GROUP BY/HAVING — the six clauses GATE drills
  • Cross product vs join: FROM A, B before WHERE makes |A|·|B| rows
  • Aggregates (COUNT/SUM/AVG/MIN/MAX) collapse rows per group
  • Computing the exact row count of a small query by hand

Before you start

“How many rows does this query return?” That is the GATE DA question, almost word for word. Not “write a query,” not “design a schema” — just stare at a small SQL snippet and a tiny table and tell us the count.

The trick is to read SQL like a tiny pipeline: tables in, filter, group, project, rows out. Six clauses do almost all the work, and once you know the order in which they fire, counting becomes mechanical.

Reading SQL as a pipeline

A query is six steps. They run in this order — not the order you write them:

FROMpick tablesWHEREfilter rowsGROUP BYbucket rowsHAVINGfilter groupsSELECTproject colsORDER BYsort output
Logical order: tables → rows → groups → columns. SELECT names appear last but you write them first.
  • SELECT picks which columns come out (and runs aggregates).
  • FROM picks the tables. Multiple tables separated by commas form a cross product — every row of A paired with every row of B.
  • WHERE filters individual rows.
  • JOIN … ON combines two tables on a matching condition — same as cross product plus a WHERE, but cleaner.
  • GROUP BY buckets rows that share a value, and aggregates collapse each bucket to one row.
  • HAVING filters those grouped buckets (use it instead of WHERE when the test involves an aggregate).
  • DISTINCT removes duplicate output rows at the end.

The cross-product mental model

The most common GATE pattern uses the old “comma join” — FROM Raider, Team. That’s a cross product: if Raider has 5 rows and Team has 5 rows, the intermediate result is 25 rows. The WHERE clause then prunes it.

A (3 rows)a₁a₂a₃B (3 rows)b₁b₂b₃A × B (9 rows)a₁,b₁a₁,b₂a₁,b₃a₂,b₁a₂,b₂a₂,b₃a₃,b₁a₃,b₂a₃,b₃WHERE A.id = B.id → keep diagonal (3 rows)+ further filters → fewer rows still
FROM A, B fires first, exploding rows. WHERE then prunes — most rows usually die there.

Worked example — GATE DA 2024 Q21

This is the actual exam question, stated nearly verbatim:

Schema: Raider(ID, City, ...) and Team(ID, RaidPoints, ...). How many rows does the following SQL query return?

SELECT *
FROM   Raider, Team
WHERE  Raider.ID = Team.ID
   AND City = 'Jaipur'
   AND RaidPoints > 200;

The answer is 3. Let’s see why with a concrete tiny dataset. We have six raiders and the same six teams (one per raider, joined on ID):

Raider.IDCityTeam.IDRaidPoints
1Jaipur1250
2Jaipur2210
3Mumbai3300
4Jaipur4180
5Jaipur5220
6Patna6400

Walk through the WHERE filter row by row after the ID match collapses 36 cross-product rows down to 6:

  • ID 1: City = Jaipur, RaidPoints = 250 > 200. Keep.
  • ID 2: Jaipur, 210 > 200. Keep.
  • ID 3: Mumbai — wrong city. Drop.
  • ID 4: Jaipur, 180 — not above 200. Drop.
  • ID 5: Jaipur, 220 > 200. Keep.
  • ID 6: Patna — wrong city. Drop.

Three rows survive. Run it yourself — edit the data, change the threshold, watch the count move:

Try removing AND RaidPoints > 200 — you get 4 rows (every Jaipur raider). Drop the ID-equality condition and you get 24 (4 Jaipur raiders cross 6 teams). The WHERE clause is doing all the work.

Aggregates and GROUP BY

COUNT, SUM, AVG, MIN, MAX collapse many rows into one. Add GROUP BY and you get one collapsed row per group instead of one for the whole table.

SELECT   City, COUNT(*) AS raiders
FROM     Raider
GROUP BY City;

Output rows = number of distinct cities = 3 (Jaipur, Mumbai, Patna) — not 6.

HAVING filters those groups after aggregation:

SELECT   City, COUNT(*) AS raiders
FROM     Raider
GROUP BY City
HAVING   COUNT(*) >= 2;

Now only Jaipur (4 raiders) survives — 1 row.

How GATE asks this

A NAT: a tiny schema (two or three tables, a handful of rows each), a query that uses some mix of cross-product join, WHERE, GROUP BY, HAVING, and maybe DISTINCT — and you must type the exact row count. GATE DA 2024 Q21 was exactly this pattern with two tables. The way to beat the question every time: walk the six-clause pipeline above with a pencil, one clause at a time, and write down the row count after each step.

Quick check

Quick check

0/6
Q1Given Raider(ID, City) with rows (1,'Jaipur'),(2,'Jaipur'),(3,'Mumbai'),(4,'Jaipur'),(5,'Jaipur'),(6,'Patna') and Team(ID, RaidPoints) with rows (1,250),(2,210),(3,300),(4,180),(5,220),(6,400), how many rows does `SELECT * FROM Raider, Team WHERE Raider.ID = Team.ID AND City = 'Jaipur' AND RaidPoints > 200` return?numerical answer — type a number
Q2Same Raider/Team tables. How many rows does `SELECT * FROM Raider, Team WHERE City = 'Jaipur'` return? (Note: no join condition.)numerical answer — type a number
Q3Same Raider table. How many rows does `SELECT City, COUNT(*) FROM Raider GROUP BY City HAVING COUNT(*) >= 2` return?numerical answer — type a number
Q4Which statements about SQL evaluation are TRUE? (select all that apply)select all that apply
Q5Table T(x) has rows {1, 1, 2, 2, 3, NULL}. How many rows does `SELECT DISTINCT x FROM T` return? (SQL treats NULL as a distinct group of its own.)numerical answer — type a number
Q6Given the Raider/Team tables, how many rows does `SELECT DISTINCT City FROM Raider, Team WHERE Raider.ID = Team.ID AND RaidPoints > 200` return?numerical answer — type a number

Practice this in an interview

All questions
How do you use EXPLAIN / EXPLAIN ANALYZE to diagnose a slow query?

EXPLAIN shows the optimizer's chosen plan with estimated rows and costs before execution. EXPLAIN ANALYZE runs the query and overlays actual row counts and timing, letting you spot where estimates diverge from reality — bad statistics, missing indexes, or the wrong join algorithm — and target the correct fix.

Walk me through how you would systematically diagnose a slow SQL query in production.

Start by capturing the query plan with EXPLAIN ANALYZE to find the most expensive node, then check whether estimates match actuals to spot stale statistics, look for missing indexes on filter and join columns, verify the predicates are sargable, and finally check for resource contention — locks, buffer eviction, or an overwhelmed disk.

How can aggregating after a JOIN produce inflated (double-counted) totals, and how do you fix it?

A JOIN that fans out rows — one-to-many or many-to-many — causes the same source row to appear multiple times in the joined result set. Aggregating on that inflated set multiplies values, giving totals larger than the true sum.

How do common SQL operations map to pandas, and when should you use SQL instead of pandas?

Every core SQL clause — SELECT, WHERE, GROUP BY, HAVING, JOIN, ORDER BY, LIMIT — has a direct pandas equivalent, but SQL executes inside a database engine with optimized query planning and disk-backed storage, while pandas requires all data to fit in RAM. Use SQL for large persistent datasets and pandas for in-memory transformation, feature engineering, and integration with the Python ML ecosystem.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content