SQL: Computing Results by Hand
Read a SQL query and predict its row count — the single skill GATE drills with the SELECT/JOIN/GROUP BY toolkit.
What you'll learn
- SELECT/FROM/WHERE/JOIN/GROUP BY/HAVING — the six clauses GATE drills
- Cross product vs join: FROM A, B before WHERE makes |A|·|B| rows
- Aggregates (COUNT/SUM/AVG/MIN/MAX) collapse rows per group
- Computing the exact row count of a small query by hand
Before you start
“How many rows does this query return?” That is the GATE DA question, almost word for word. Not “write a query,” not “design a schema” — just stare at a small SQL snippet and a tiny table and tell us the count.
The trick is to read SQL like a tiny pipeline: tables in, filter, group, project, rows out. Six clauses do almost all the work, and once you know the order in which they fire, counting becomes mechanical.
Reading SQL as a pipeline
A query is six steps. They run in this order — not the order you write them:
- SELECT picks which columns come out (and runs aggregates).
- FROM picks the tables. Multiple tables separated by commas form a cross product — every row of A paired with every row of B.
- WHERE filters individual rows.
- JOIN … ON combines two tables on a matching condition — same as cross product plus a WHERE, but cleaner.
- GROUP BY buckets rows that share a value, and aggregates collapse each bucket to one row.
- HAVING filters those grouped buckets (use it instead of WHERE when the test involves an aggregate).
- DISTINCT removes duplicate output rows at the end.
The cross-product mental model
The most common GATE pattern uses the old “comma join” — FROM Raider, Team. That’s a cross product: if Raider has 5 rows and Team has 5 rows, the intermediate result is 25 rows. The WHERE clause then prunes it.
Worked example — GATE DA 2024 Q21
This is the actual exam question, stated nearly verbatim:
Schema:
Raider(ID, City, ...)andTeam(ID, RaidPoints, ...). How many rows does the following SQL query return?SELECT * FROM Raider, Team WHERE Raider.ID = Team.ID AND City = 'Jaipur' AND RaidPoints > 200;
The answer is 3. Let’s see why with a concrete tiny dataset. We have six raiders and the same six teams (one per raider, joined on ID):
| Raider.ID | City | Team.ID | RaidPoints | |
|---|---|---|---|---|
| 1 | Jaipur | 1 | 250 | |
| 2 | Jaipur | 2 | 210 | |
| 3 | Mumbai | 3 | 300 | |
| 4 | Jaipur | 4 | 180 | |
| 5 | Jaipur | 5 | 220 | |
| 6 | Patna | 6 | 400 |
Walk through the WHERE filter row by row after the ID match collapses 36 cross-product rows down to 6:
- ID 1: City = Jaipur, RaidPoints = 250 > 200. Keep.
- ID 2: Jaipur, 210 > 200. Keep.
- ID 3: Mumbai — wrong city. Drop.
- ID 4: Jaipur, 180 — not above 200. Drop.
- ID 5: Jaipur, 220 > 200. Keep.
- ID 6: Patna — wrong city. Drop.
Three rows survive. Run it yourself — edit the data, change the threshold, watch the count move:
Try removing AND RaidPoints > 200 — you get 4 rows (every Jaipur raider). Drop the ID-equality condition and you get 24 (4 Jaipur raiders cross 6 teams). The WHERE clause is doing all the work.
Aggregates and GROUP BY
COUNT, SUM, AVG, MIN, MAX collapse many rows into one. Add GROUP BY and you get one collapsed row per group instead of one for the whole table.
SELECT City, COUNT(*) AS raiders
FROM Raider
GROUP BY City;
Output rows = number of distinct cities = 3 (Jaipur, Mumbai, Patna) — not 6.
HAVING filters those groups after aggregation:
SELECT City, COUNT(*) AS raiders
FROM Raider
GROUP BY City
HAVING COUNT(*) >= 2;
Now only Jaipur (4 raiders) survives — 1 row.
How GATE asks this
A NAT: a tiny schema (two or three tables, a handful of rows each), a query that uses some mix of cross-product join, WHERE, GROUP BY, HAVING, and maybe DISTINCT — and you must type the exact row count. GATE DA 2024 Q21 was exactly this pattern with two tables. The way to beat the question every time: walk the six-clause pipeline above with a pencil, one clause at a time, and write down the row count after each step.
Quick check
Quick check
Practice this in an interview
All questionsEXPLAIN shows the optimizer's chosen plan with estimated rows and costs before execution. EXPLAIN ANALYZE runs the query and overlays actual row counts and timing, letting you spot where estimates diverge from reality — bad statistics, missing indexes, or the wrong join algorithm — and target the correct fix.
Start by capturing the query plan with EXPLAIN ANALYZE to find the most expensive node, then check whether estimates match actuals to spot stale statistics, look for missing indexes on filter and join columns, verify the predicates are sargable, and finally check for resource contention — locks, buffer eviction, or an overwhelmed disk.
A JOIN that fans out rows — one-to-many or many-to-many — causes the same source row to appear multiple times in the joined result set. Aggregating on that inflated set multiplies values, giving totals larger than the true sum.
Every core SQL clause — SELECT, WHERE, GROUP BY, HAVING, JOIN, ORDER BY, LIMIT — has a direct pandas equivalent, but SQL executes inside a database engine with optimized query planning and disk-backed storage, while pandas requires all data to fit in RAM. Use SQL for large persistent datasets and pandas for in-memory transformation, feature engineering, and integration with the Python ML ecosystem.