What are 1NF, 2NF, and 3NF, and when would you intentionally denormalize?

1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.

What is the gaps-and-islands problem, and how do you solve it with window functions?

Gaps-and-islands is the problem of identifying contiguous ranges (islands) within ordered sequential data and the breaks (gaps) between them. The classic solution subtracts a dense sequential integer from the ordering column — equal differences belong to the same island.

Explain joint, marginal, and conditional distributions and how to move between them.

The joint distribution P(X, Y) fully specifies two random variables together. Marginals P(X) and P(Y) are obtained by summing (or integrating) the joint over the other variable. Conditionals P(X|Y=y) are the joint sliced at a fixed y value, renormalized by the marginal P(Y=y).

Why does SQL require every non-aggregated SELECT column to appear in GROUP BY?

Because after grouping, multiple source rows collapse into one output row. Any column not in the GROUP BY key could have different values across those collapsed rows, making a single deterministic output value impossible without an aggregate function.

Functional Dependencies & Closure — GATE DA

The last lesson ended on a rotting wide table — the instructor’s name copied into every enrolment row, waiting to contradict itself. The tool that names exactly what went wrong is the functional dependency. Picture an employee table where every employee has exactly one department. Hand me two rows that share the same emp_id, and I will bet my laptop they share the same dept too. That little promise — “agree on the left, must agree on the right” — is a functional dependency.

Once you see it that way, FDs stop feeling abstract. They are just rules the data has to obey. The whole game in DBMS is: given a handful of these rules, what else must be true? This is also the quiet engine behind every schema you will design on the job — the same closure you compute here is what decides whether a table is safe to split or doomed to update anomalies.

What a functional dependency actually says

We write X → Y and read it as “X determines Y.” It means: in any valid instance of the table, whenever two rows agree on every attribute in X, they also agree on every attribute in Y. Both X and Y can be a single attribute or a set.

Examples that are easy to feel:

emp_id → name — one ID, one name.
course_id, semester → instructor — that course in that semester has one instructor.
dept → dept_head — every department has one head.

An FD is a constraint on the schema, not a property of one particular table snapshot.

Armstrong’s axioms — the three rules that generate the rest

From a small set of FDs you can derive many more. Armstrong gave us three rules that are sound (only produce true FDs) and complete (produce every true FD).

Reflexivity — if Y is a subset of X, then X → Y. (Trivial: a set determines any of its parts.)
Augmentation — if X → Y, then XZ → YZ for any Z. (You can pad both sides with extra attributes.)
Transitivity — if X → Y and Y → Z, then X → Z. (Chain them.)

Three handy corollaries you will reuse:

Union: X → Y and X → Z give X → YZ.
Decomposition: X → YZ gives X → Y and X → Z.
Pseudo-transitivity: X → Y and WY → Z give WX → Z.

Attribute closure X⁺ — the workhorse

Given a set of FDs, the attribute closure X⁺ is the largest set of attributes determined by X. It answers the only question that matters in practice: starting from X, what can we conclude?

The algorithm is mechanical. Start with closure = X. Walk through every FD; if its full LHS is already inside closure, add its RHS to closure. Repeat until nothing changes.

The closure grows monotonically; stop when a full pass over the FDs adds nothing.

How GATE asks this

Two main flavours, both painless once you can compute a closure:

NAT — “What is the size of X⁺ under the given FDs?” Just run the algorithm and count.
MSQ — “Which of the following FDs are derivable from F?” For each candidate A → B, compute A⁺ and check whether B lies inside it. If yes, it is derivable.

Worked example — GATE DA 2024

R(U, V, W, X, Y, Z) with FDs F = {U → V, U → W, WX → Y, WX → Z, V → X}. Which of the following are derivable? (A) VW → Y (B) VW → YZ (C) VW → U (D) WX → YZ

Strategy: one closure, VW⁺, settles A, B, and C at once; then WX⁺ settles D.

Compute VW⁺. Start closure = {V, W}.

V → X: LHS V is in the closure, so add X → closure = {V, W, X}.
WX → Y: LHS W, X both present, add Y → closure = {V, W, X, Y}.
WX → Z: LHS W, X both present, add Z → closure = {V, W, X, Y, Z}.
Another pass adds nothing. So VW⁺ = VWXYZ.

That contains Y and YZ, so A and B are derivable. It does not contain U — and here’s the shortcut worth remembering: U never appears on the right-hand side of any FD, so no closure that doesn’t already start with U can ever acquire it. C is not derivable.

Compute WX⁺ for D: start {W, X}; WX → Y adds Y, WX → Z adds Z → WX⁺ = WXYZ, which contains YZ. D is derivable. The official 2024 answer: A, B, and D.

A question to carry forward

The closure is the engine, and you have just used it to confirm something is a key: compute X⁺, and if it spans every attribute, X is a superkey. But confirming a key you already guessed is the easy direction. The hard one is the reverse: handed only a tangle of FDs, with no key in sight and perhaps several candidate keys hiding among the attributes, how do you find them all? Here is the thread onward: which attributes are forced to belong to every candidate key, which can never appear in one, and what systematic hunt turns a list of FDs into the complete set of a table’s keys?

In one breath

A functional dependency X → Y = “rows agreeing on X must agree on Y” — a schema rule, not a snapshot fact.
Armstrong’s axioms (sound + complete): reflexivity (Y ⊆ X ⇒ X → Y), augmentation (X → Y ⇒ XZ → YZ), transitivity (X → Y, Y → Z ⇒ X → Z). Corollaries: union, decomposition, pseudo-transitivity.
Attribute closure X⁺ = everything X determines. Algorithm: start with X, add an FD’s RHS whenever its full LHS is inside, repeat until nothing changes.
A → B is derivable iff B ⊆ A⁺. X is a superkey iff X⁺ = all attributes.
Trap: an FD fires only when its entire LHS is present (WX → Y needs both W and X); never stop the iteration early.

Practice

Quick check

0/6

Q1Recall: which statements about Armstrong's axioms are TRUE? (select all that apply)select all that apply

Q2Trace: R(A,B,C,D,E) with F = {A → BC, CD → E, B → D, E → A}. What is the size of B⁺ (number of attributes)? (integer)numerical answer — type a number

Q3Apply: same R and F = {A → BC, CD → E, B → D, E → A}. What is the size of A⁺? (integer)numerical answer — type a number

Q4Apply: R(A,B,C,D) with F = {AB → C, C → D, D → A}. Is A → B derivable?

Q5Apply: from F = {U → V, U → W, WX → Y, WX → Z, V → X}, which FDs are derivable? (select all that apply)select all that apply

Q6Create: a booking table has FDs booking_id → seat, seat → flight, flight → airline. A teammate claims 'booking_id determines the airline, so the airline column is redundant.' Are they right (does booking_id⁺ contain airline)?

Functional Dependencies & Closure

What you'll learn

Before you start

What a functional dependency actually says

Armstrong’s axioms — the three rules that generate the rest

Attribute closure X⁺ — the workhorse

How GATE asks this

Worked example — GATE DA 2024

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further