When would you use a self-join, and how do you write one?

A self-join joins a table to itself, typically to compare rows within the same dataset — classic use cases are finding employee-manager relationships in a single table, detecting duplicate rows, or comparing a row to the previous/next row when window functions are unavailable.

How do you safely join two tables in a many-to-many relationship without creating a row explosion?

Many-to-many joins produce a Cartesian product of each matching subset, multiplying row counts exponentially. The correct approach is to pre-aggregate at least one side to a unique grain before joining, or to use a bridge/junction table that resolves the relationship into two one-to-many joins.

What does a CROSS JOIN do, and when is it actually useful?

A CROSS JOIN produces the Cartesian product of two tables — every row from the left paired with every row from the right — giving M x N output rows with no join condition. It is useful for generating date spines, creating all combinations of dimension values, or populating test data grids.

What is an anti-join, how do you implement one in SQL, and which implementation is most reliable?

An anti-join returns rows from the left table that have no matching row in the right table — the inverse of a semi-join. The three implementations are NOT EXISTS, NOT IN, and a LEFT JOIN with a NULL filter; NOT EXISTS is the most reliable because it is NULL-safe and communicates intent clearly.

Joins & Division — GATE DA

What you'll learn

Natural join ⋈ matches rows on shared column names and keeps one copy

Theta join filters the cross product by any predicate; equi-join is the equality-only special case

Division A ÷ B finds tuples in A that match EVERY tuple in B — the for-all operator

How to read a nested join expression by walking inside-out

The last lesson ended on the cross product’s wastefulness — m·n rows when you only wanted the matches — and hinted at a tidier “for-all” operator. Both wishes are granted here. You want every student paired with the courses they are enrolled in. The Student table holds the name; the Enroll table holds the course; they share a column, id. So you want the database to find the matching rows and stitch them into one. That stitching is a join — the workhorse operator of every real query, and the single line of SQL most likely to blow up your runtime or silently duplicate rows when you pick the matching column wrong.

And once you have joins, you can finally ask the toughest shape of question: “who has done every one of these?” That is what division is for.

Three flavours of join

Start with the picture. We will match Student(id, name) to Enroll(id, course) on the shared column id.

Natural join matches on shared column names and keeps one copy. Unmatched rows (Maya) are dropped.

Natural join R ⋈ S — find every common attribute name, keep rows where those columns are equal, and merge the columns so the result has each common attribute once. The simplest, prettiest join.
Theta join R ⋈_θ S — like a cross product but only keep rows satisfying the predicate θ. The predicate can be ANYTHING (<, >, ≠, AND, OR, …).
Equi-join — a theta join where θ is only equality, like R.x = S.y. Common columns are NOT merged, so both R.x and S.y stay.

Mental shortcut: theta join = σ_θ(R × S). Natural join is a special equi-join where the predicate is “equality on all shared column names” and the duplicate columns are folded into one.

Division — the “for-all” operator

Counting “students enrolled in some course” is easy. Counting “students enrolled in EVERY course in a given list” is the hard shape. SQL has no DIVIDE keyword; RA does:

A ÷ B   =   { t  :  for every b in B, the row (t, b) is in A }

If A = Enroll(student, course) and B is the set of required courses, then A ÷ B gives back the students enrolled in every required course. The B-side acts like a checklist that every output tuple must satisfy. That checklist shape is the “for-all” pattern relational-algebra-1 was reaching for with the subset idiom.

How GATE asks this

Two MCQ shapes appear. The first gives a nested join expression and asks what it returns in plain English — you walk it inside-out. The second sets up the “for-all” story and asks which expression encodes it (division or the subset idiom from the previous lesson). NAT versions ask for the size of a join result.

Worked example — GATE DA 2025 Q7

Relations: Own(owner, car), Car(car, color), Make(car, maker). Evaluate
π_owner( Own ⋈ σ_color = "red" ( Car ⋈ σ_maker = "ABC" Make ) )

Walk inside-out, the way the engine does:

σ_maker = "ABC" Make — rows of Make where the maker is “ABC”. Each row is (car, "ABC"). Call this M_ABC.
Car ⋈ M_ABC — natural join on the shared column car. Result: rows (car, color, "ABC") for every car made by ABC.
σ_color = "red" (…) — keep only the red ones. Result: rows (car, "red", "ABC") — exactly the red cars made by ABC. Call this RedABC.
Own ⋈ RedABC — natural join on car. Match every owner row to one of these red ABC cars. The result is the owners of those cars (with extra columns we drop next).
π_owner(…) — keep only the owner column.

So the expression returns all owners of a red car made by ABC — option C of the original 2025 paper. The trick to reading any nested join is the same: start at the innermost σ, peel outward, and at each step describe the result in one English sentence before moving up.

A question to carry forward

Notice something about every expression in these two lessons: you never said what you wanted, only how to compute it — apply σ, then ⋈, then π, in a fixed order. Relational algebra is a recipe, a procedure. But a recipe is a strange way to ask a question. What you really mean is simply “the rows t such that t is a student whose marks exceed 80” — a description, with the steps left to the machine. Here is the thread onward: is there a way to declare the rows you want with a logical formula instead of a sequence of operators — and is that declarative style, rather than the procedural one, the true spirit of the SQL you are about to learn?

In one breath

A join stitches two tables on matching values. Natural join R ⋈ S: match on shared column names, keep one copy of them, drop unmatched rows. Theta join = σ_θ(R × S) (any predicate). Equi-join = theta with equality only, both columns kept.
Trap: natural join with no shared column name silently becomes the cross product (|R|·|S| rows).
Division A ÷ B = the for-all operator: tuples t such that (t, b) ∈ A for every b ∈ B (students in every required course). Schema of A ÷ B = A’s schema minus B’s columns.
Read nested joins inside-out: innermost σ first, describe each step in one English sentence (GATE DA 2025 Q7).
Division ≠ set difference: A ÷ B is “for-all,” A − B is “in A but not B.”

Practice

Quick check

0/6

Q1Recall: which statements about joins and division are TRUE? (select all that apply)select all that apply

Q2Recall: relations R(a, b) and S(c, d) share NO column name. What is the result of R ⋈ S?

Q3Trace: R(a, b) has 5 rows, S(b, c) has 4 rows. Every row of R shares its b-value with exactly 2 rows of S. How many rows does the natural join R ⋈ S have? (integer)numerical answer — type a number

Q4Apply: Student(id, name) has 4 rows. Enroll(id, course) has 6 rows. After the natural join Student ⋈ Enroll, the maximum possible number of rows is: (integer)numerical answer — type a number

Q5Apply: which expression returns 'names of students enrolled in CS101'? Schemas: Student(id, name), Enroll(id, course).

Q6Create: Enroll(student, course) and Required(course) is the list of required courses. Which expression returns the students enrolled in EVERY required course?

Joins & Division

What you'll learn

Before you start

Three flavours of join

Division — the “for-all” operator

How GATE asks this

Worked example — GATE DA 2025 Q7

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further