Joins & Division
Joins stitch tables together on a shared key; division answers 'who has done every one of these?' — the two heavy-hitters of relational algebra.
What you'll learn
- Natural join ⋈ matches rows on shared column names and keeps one copy
- Theta join filters the cross product by any predicate; equi-join is the equality-only special case
- Division A ÷ B finds tuples in A that match EVERY tuple in B — the for-all operator
- How to read a nested join expression by walking inside-out
Before you start
You want every student paired with the courses they’re enrolled in. The
Student table has the name; the Enroll table has the course. They share a
column — id — so you’d like the database to find the matching rows and stitch
them into one. That stitching is a join, the workhorse operator of every
real query — and the single line of SQL most likely to blow up your runtime or
silently duplicate rows when you get the matching column wrong.
And once you have joins, you can finally ask the toughest shape of question: “who has done every one of these?” That’s what division is for.
Three flavours of join
Start with the picture. We’ll match Student(id, name) to Enroll(id, course)
on the shared column id.
- Natural join
R ⋈ S— find every common attribute name, keep rows where those columns are equal, and merge the columns so the result has each common attribute once. The simplest, prettiest join. - Theta join
R ⋈_θ S— like a cross product but only keep rows satisfying the predicate θ. The predicate can be ANYTHING (<,>,≠, AND, OR, …). - Equi-join — a theta join where θ is only equality, like
R.x = S.y. Common columns are NOT merged, so bothR.xandS.ystay.
Mental shortcut: theta join = σ_θ(R × S). Natural join is a special equi-join
where the predicate is “equality on all shared column names” and the duplicate
columns are folded into one.
Division — the “for-all” operator
Counting “students enrolled in some course” is easy. Counting “students enrolled
in EVERY course in a given list” is the hard shape. SQL doesn’t have a DIVIDE
keyword; RA does:
A ÷ B = { t : for every b in B, the row (t, b) is in A }
If A = Enroll(student, course) and B is the set of required courses, then
A ÷ B gives back the students enrolled in every required course. The B-side
acts like a checklist that every output tuple must satisfy. That checklist shape
is the “for-all” pattern.
How GATE asks this
Two MCQ shapes appear. The first gives a nested join expression and asks what it returns in plain English — you walk it inside-out. The second sets up the “for-all” story and asks which expression encodes it (division or the subset idiom from the previous lesson). NAT versions ask for the size of a join result.
Worked example — GATE DA 2025 Q7
Relations:
Own(owner, car),Car(car, color),Make(car, maker). Evaluateπ_owner( Own ⋈ σ_color = "red" ( Car ⋈ σ_maker = "ABC" Make ) )
Walk inside-out, the way the engine does:
σ_maker = "ABC" Make— rows of Make where the maker is “ABC”. Each row is(car, "ABC"). Call thisM_ABC.Car ⋈ M_ABC— natural join on the shared columncar. Result: rows(car, color, "ABC")for every car made by ABC.σ_color = "red" (…)— keep only the red ones. Result: rows(car, "red", "ABC")— exactly the red cars made by ABC. Call thisRedABC.Own ⋈ RedABC— natural join oncar. Match every owner row to one of these red ABC cars. The result is the owners of those cars (with extra columns we’ll drop next).π_owner(…)— keep only the owner column.
So the expression returns all owners of a red car made by ABC — option C of the original 2025 paper. The trick to reading any nested join is the same: start at the innermost σ, peel outward, and at each step describe the result in one English sentence before moving up.
Quick check
Quick check
Practice this in an interview
All questionsA self-join joins a table to itself, typically to compare rows within the same dataset — classic use cases are finding employee-manager relationships in a single table, detecting duplicate rows, or comparing a row to the previous/next row when window functions are unavailable.
Many-to-many joins produce a Cartesian product of each matching subset, multiplying row counts exponentially. The correct approach is to pre-aggregate at least one side to a unique grain before joining, or to use a bridge/junction table that resolves the relationship into two one-to-many joins.
A CROSS JOIN produces the Cartesian product of two tables — every row from the left paired with every row from the right — giving M x N output rows with no join condition. It is useful for generating date spines, creating all combinations of dimension values, or populating test data grids.
An anti-join returns rows from the left table that have no matching row in the right table — the inverse of a semi-join. The three implementations are NOT EXISTS, NOT IN, and a LEFT JOIN with a NULL filter; NOT EXISTS is the most reliable because it is NULL-safe and communicates intent clearly.