datarekha

Joins & Division

Joins stitch tables together on a shared key; division answers 'who has done every one of these?' — the two heavy-hitters of relational algebra.

8 min read Intermediate GATE DA Lesson 66 of 122

What you'll learn

  • Natural join ⋈ matches rows on shared column names and keeps one copy
  • Theta join filters the cross product by any predicate; equi-join is the equality-only special case
  • Division A ÷ B finds tuples in A that match EVERY tuple in B — the for-all operator
  • How to read a nested join expression by walking inside-out

Before you start

You want every student paired with the courses they’re enrolled in. The Student table has the name; the Enroll table has the course. They share a column — id — so you’d like the database to find the matching rows and stitch them into one. That stitching is a join, the workhorse operator of every real query — and the single line of SQL most likely to blow up your runtime or silently duplicate rows when you get the matching column wrong.

And once you have joins, you can finally ask the toughest shape of question: “who has done every one of these?” That’s what division is for.

Three flavours of join

Start with the picture. We’ll match Student(id, name) to Enroll(id, course) on the shared column id.

Studentidname1Asha2Ravi3Mayaon idEnrollidcourse1CS1011MA2012CS101=id name course1 Asha CS1011 Asha MA2012 Ravi CS101Maya: no match.
Natural join matches on shared column names and keeps one copy. Unmatched rows (Maya) are dropped.
  • Natural join R ⋈ S — find every common attribute name, keep rows where those columns are equal, and merge the columns so the result has each common attribute once. The simplest, prettiest join.
  • Theta join R ⋈_θ S — like a cross product but only keep rows satisfying the predicate θ. The predicate can be ANYTHING (<, >, , AND, OR, …).
  • Equi-join — a theta join where θ is only equality, like R.x = S.y. Common columns are NOT merged, so both R.x and S.y stay.

Mental shortcut: theta join = σ_θ(R × S). Natural join is a special equi-join where the predicate is “equality on all shared column names” and the duplicate columns are folded into one.

Division — the “for-all” operator

Counting “students enrolled in some course” is easy. Counting “students enrolled in EVERY course in a given list” is the hard shape. SQL doesn’t have a DIVIDE keyword; RA does:

A ÷ B   =   { t  :  for every b in B, the row (t, b) is in A }

If A = Enroll(student, course) and B is the set of required courses, then A ÷ B gives back the students enrolled in every required course. The B-side acts like a checklist that every output tuple must satisfy. That checklist shape is the “for-all” pattern.

How GATE asks this

Two MCQ shapes appear. The first gives a nested join expression and asks what it returns in plain English — you walk it inside-out. The second sets up the “for-all” story and asks which expression encodes it (division or the subset idiom from the previous lesson). NAT versions ask for the size of a join result.

Worked example — GATE DA 2025 Q7

Relations: Own(owner, car), Car(car, color), Make(car, maker). Evaluate

π_owner( Own ⋈ σ_color = "red" ( Car ⋈ σ_maker = "ABC" Make ) )

Walk inside-out, the way the engine does:

  1. σ_maker = "ABC" Make — rows of Make where the maker is “ABC”. Each row is (car, "ABC"). Call this M_ABC.
  2. Car ⋈ M_ABC — natural join on the shared column car. Result: rows (car, color, "ABC") for every car made by ABC.
  3. σ_color = "red" (…) — keep only the red ones. Result: rows (car, "red", "ABC") — exactly the red cars made by ABC. Call this RedABC.
  4. Own ⋈ RedABC — natural join on car. Match every owner row to one of these red ABC cars. The result is the owners of those cars (with extra columns we’ll drop next).
  5. π_owner(…) — keep only the owner column.

So the expression returns all owners of a red car made by ABC — option C of the original 2025 paper. The trick to reading any nested join is the same: start at the innermost σ, peel outward, and at each step describe the result in one English sentence before moving up.

Quick check

Quick check

0/6
Q1Student(id, name) has 4 rows. Enroll(id, course) has 6 rows. After the natural join Student ⋈ Enroll, the maximum possible number of rows is:numerical answer — type a number
Q2Relations R(a, b) and S(c, d) share NO column name. What is the result of R ⋈ S?
Q3Which expression returns 'names of students enrolled in CS101'? Schemas: Student(id, name), Enroll(id, course).
Q4Which statements about joins and division are TRUE? (select all that apply)select all that apply
Q5Enroll(student, course) and Required(course) is the list of required courses. Which expression returns the names of students enrolled in EVERY required course?
Q6R(a, b) has 5 rows, S(b, c) has 4 rows. Every row of R shares its b-value with exactly 2 rows of S. How many rows does the natural join R ⋈ S have?numerical answer — type a number

Practice this in an interview

All questions

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content