How do common SQL operations map to pandas, and when should you use SQL instead of pandas?

Every core SQL clause — SELECT, WHERE, GROUP BY, HAVING, JOIN, ORDER BY, LIMIT — has a direct pandas equivalent, but SQL executes inside a database engine with optimized query planning and disk-backed storage, while pandas requires all data to fit in RAM. Use SQL for large persistent datasets and pandas for in-memory transformation, feature engineering, and integration with the Python ML ecosystem.

What is the difference between OLTP and OLAP workloads, and how does that drive database design choices?

OLTP systems handle many small, latency-sensitive transactions that read and write a few rows at a time, so they are optimized for fast point lookups and row-level locking. OLAP systems run infrequent but wide analytical queries over millions of rows, so they benefit from columnar storage, bulk scans, and denormalized schemas that minimize joins.

Can you GROUP BY a derived expression or a SELECT alias, and how does this differ across databases?

You can always GROUP BY a derived expression written directly. Whether you can reference a SELECT alias in GROUP BY depends on the database: MySQL and BigQuery allow it, while PostgreSQL and SQL Server do not because aliases are not resolved until after GROUP BY in the logical order.

Given a query that filters on both a raw column and an aggregate result, how do you structure it for correctness and performance?

Raw-column filters belong in WHERE so the engine scans fewer rows before grouping. Aggregate filters must go in HAVING. Applying a filter in HAVING that could have been in WHERE forces the engine to aggregate more rows than necessary.

Relational Algebra I — GATE DA

What you'll learn

Selection σ keeps rows that match a predicate; projection π keeps columns and drops duplicates

Cross product × pairs every row with every row; union, intersection, difference need union-compatible relations

Rename ρ renames a relation or its attributes — the only operator that does no real work on data

The A − B = ∅ idiom: how relational algebra encodes a 'for-all' subset check

The last lesson promised a language for questioning a relation, not just filling it — here it is. You want every student whose marks are above 80, then only their names, with no IDs or roll numbers cluttering the answer. SQL spells that SELECT name FROM Student WHERE marks > 80. Underneath, the database engine does something simpler: it runs two tiny operators back to back. That underneath layer is relational algebra — six little symbols that combine to build every query you will ever write. It is no museum piece, either: every query planner, from Postgres to Spark SQL, rewrites your SQL into exactly these operators before deciding how to run it.

Each operator takes one or two relations (think tables) and returns another relation. So you can chain them freely, like Lego.

The six basic operators

We will meet them with one table: Student(id, name, marks) with rows (1, Asha, 85), (2, Ravi, 70), (3, Maya, 92).

Selection σ picks rows that satisfy a predicate (a true/false test on a row, like marks ≥ 80). Projection π picks columns. Two operators, two halves of every basic SQL query.

σ filters along the row axis; π filters along the column axis.

σ_predicate(R) — keep the rows of R where the predicate is true. The predicate can use =, comparisons, AND, OR, NOT on attributes.
π_attrs(R) — keep only the listed columns. Set semantics: if dropping columns creates duplicate rows, they collapse into one.
R × S — Cartesian product. Every row of R paired with every row of S. If R has m rows and S has n, the result has m·n rows.
R ∪ S, R ∩ S, R − S — set operations. R and S must be union-compatible: same number of columns, same domains in the same order.
ρ_S(R) or ρ_S(a,b,c)(R) — rename the relation (and optionally its attributes). Useful before joining a table to itself.

That is it — six operators. SQL is mostly sugar over them.

The subset idiom — encoding “for-all”

Plain SQL has no tidy way to say “every row of A appears in B.” In relational algebra, set difference makes it tidy:

A − B = ∅   ⇔   A ⊆ B   ⇔   every tuple of A is also in B

If A minus B is empty, nothing in A was missing from B — so A is a subset. That tiny trick is how RA encodes “for-all” questions before you reach for division.

How GATE asks this

A favourite MCQ pattern: give you four candidate expressions and ask which one checks a “for-all” condition. The answer hinges on whether you can read the A − B = ∅ shape. GATE DA 2024 Q16 did exactly that — see the worked example.

Worked example — GATE DA 2024 Q16

Relations are Team(name), Defender(name), Forward(name). Which expression evaluates to true if and only if every player in Team is BOTH a defender AND a forward?

The condition is “every name in Team is in π_name(Defender) ∩ π_name(Forward)” — a subset check. Apply the idiom:

π_name(Team) − ( π_name(Defender) ∩ π_name(Forward) )  =  ∅

Reading it left to right: take all the names in Team; subtract those who are in both Defender and Forward; if nothing is left over, every Team member is both — which was the question. That is the 2024 answer (option C).

Watch the direction. The expression ( π_name(Defender) ∩ π_name(Forward) ) − π_name(Team) = ∅ says the OPPOSITE — every dual-role player is in Team — a different claim entirely.

A second sanity rule: π drops duplicates (set semantics). So π_name(σ_marks ≥ 80(Student)) on a table where two students happen to share a name returns one row, not two. SQL is bag-semantics by default; pure RA is set-semantics. GATE expects set-semantics unless told otherwise.

A question to carry forward

One operator above is quietly wasteful. The cross product R × S pairs every row of R with every row of S — m·n rows — but you almost never want all of them. Ask for “each student beside the courses they took” and the cross product also pairs Asha with courses she never enrolled in, then leaves you to filter the garbage out by hand. There has to be a cleaner way to combine two tables on a shared column. And the subset idiom hinted at something more too — a single operator for “for-all” questions. Here is the thread onward: how do you join two relations on matching values without the cross-product explosion, and what one operator answers “find the students who took every course”?

In one breath

Relational algebra = six operators on relations, chainable like Lego; every SQL query compiles down to them.
σ_pred(R) keeps matching rows (schema unchanged); π_attrs(R) keeps columns (and drops duplicates — set semantics).
R × S = Cartesian product, |R|·|S| rows. ∪, ∩, − need union-compatible operands (same arity + domains). ρ renames (no data change).
Subset idiom: A − B = ∅ ⇔ A ⊆ B — RA’s way to say “every tuple of A is in B” (direction matters; B − A is the opposite claim).
Pure RA is set-semantics (no duplicates); SQL is bag-semantics unless you write DISTINCT.

Practice

Quick check

0/6

Q1Recall: which statements about the basic operators are TRUE? (select all that apply)select all that apply

Q2Recall: relations A(x) and B(x) are union-compatible. Which expression evaluates to ∅ exactly when every tuple of A is also in B?

Q3Trace: relation R has 5 rows, S has 4 rows. How many rows does R × S have? (integer)numerical answer — type a number

Q4Trace: Student(id, name, marks) has 6 rows. Two students are named 'Asha' and two 'Ravi'. How many rows does π_name(Student) return under pure relational algebra? (integer)numerical answer — type a number

Q5Apply: given Team(name) = {Asha, Ravi, Maya} and Defender(name) = {Asha, Ravi, Tara}, what does π_name(Team) − π_name(Defender) return?

Q6Create: which expressions return 'names of students with marks ≥ 80' from Student(id, name, marks)? (select all that apply)select all that apply

Relational Algebra I

What you'll learn

Before you start

The six basic operators

The subset idiom — encoding “for-all”

How GATE asks this

Worked example — GATE DA 2024 Q16

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further