Functional Dependencies & Closure
If two rows share an X-value, must they share a Y-value? That's a functional dependency — the rule that decides keys, normal forms, and half of GATE DA's DBMS marks.
What you'll learn
- A functional dependency X to Y means rows that agree on X must agree on Y
- Armstrong's axioms: reflexivity, augmentation, transitivity — and what each one buys you
- How to compute an attribute closure X-plus step by step
- Reading a closure to decide which FDs are derivable
Before you start
Picture an employee table. Every employee has exactly one department. If I hand you two rows and they share the same emp_id, you’d bet your laptop they share the same dept too. That little promise — “agree on the left, must agree on the right” — is a functional dependency.
Once you see it that way, FDs stop feeling abstract. They’re just rules the data has to obey. The whole game in DBMS is: given a handful of these rules, what else must be true? This is also the quiet engine behind every schema you’ll design on the job — the same closure you compute here is what decides whether a table is safe to split or doomed to update anomalies.
What a functional dependency actually says
We write X → Y and read it as “X determines Y.” It means: in any valid instance of the table, whenever two rows agree on every attribute in X, they also agree on every attribute in Y. Note X and Y can each be a single attribute or a set.
Examples that are easy to feel:
emp_id → name— one ID, one name.course_id, semester → instructor— that course in that semester has one instructor.dept → dept_head— every department has one head.
An FD is a constraint on the schema, not a property of one particular table snapshot.
Armstrong’s axioms — the three rules that generate the rest
From a small set of FDs you can derive many more. Armstrong gave us three rules that are sound (only produce true FDs) and complete (produce every true FD).
- Reflexivity — if
Yis a subset ofX, thenX → Y. (Trivial: a set determines any of its parts.) - Augmentation — if
X → Y, thenXZ → YZfor anyZ. (You can pad both sides with extra attributes.) - Transitivity — if
X → YandY → Z, thenX → Z. (Chain them.)
Three handy corollaries you’ll reuse:
- Union:
X → YandX → ZgiveX → YZ. - Decomposition:
X → YZgivesX → YandX → Z. - Pseudo-transitivity:
X → YandWY → ZgiveWX → Z.
Attribute closure X⁺ — the workhorse
Given a set of FDs, the attribute closure X⁺ is the largest set of attributes determined by X. It answers the only question that matters in practice: starting from X, what can we conclude?
The algorithm is mechanical. Start with closure = X. Walk through every FD; if its full LHS is already inside closure, add its RHS to closure. Repeat until nothing changes.
How GATE asks this
Two main flavours, both painless once you can compute a closure:
- NAT — “What is the size of
X⁺under the given FDs?” Just run the algorithm and count. - MSQ — “Which of the following FDs are derivable from
F?” For each candidateA → B, computeA⁺and check whetherBlies inside it. If yes, it’s derivable.
Worked example — GATE DA 2024
R(U, V, W, X, Y, Z)with FDsF = {U → V, U → W, WX → Y, WX → Z, V → X}. Which of the following are derivable?(A) VW → Y(B) U → Y(C) UX → Z(D) VW → YZ
Strategy: compute VW⁺ (handles A and D), then U⁺ (handles B), then UX⁺ (handles C).
Compute VW⁺. Start closure = {V, W}.
V → X: LHSVis in the closure, so addX→closure = {V, W, X}.WX → Y: LHSW, Xboth present, addY→closure = {V, W, X, Y}.WX → Z: LHSW, Xboth present, addZ→closure = {V, W, X, Y, Z}.- Another pass adds nothing. So
VW⁺ = VWXYZ.
That contains Y and YZ, so both A and D are derivable. (The official 2024 answer was A and D.)
Quickly check the others. U⁺: start {U}; U → V adds V; U → W adds W; now V → X adds X; WX → Y, WX → Z add Y, Z → U⁺ = UVWXYZ. So U → Y (B) is also derivable. UX⁺ contains the same full set, so UX → Z (C) is derivable too. In the original paper, only the printed options A and D were the marked-true ones — but the takeaway is the procedure.
Quick check
Quick check
Practice this in an interview
All questions1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.
Gaps-and-islands is the problem of identifying contiguous ranges (islands) within ordered sequential data and the breaks (gaps) between them. The classic solution subtracts a dense sequential integer from the ordering column — equal differences belong to the same island.
The joint distribution P(X, Y) fully specifies two random variables together. Marginals P(X) and P(Y) are obtained by summing (or integrating) the joint over the other variable. Conditionals P(X|Y=y) are the joint sliced at a fixed y value, renormalized by the marginal P(Y=y).
Because after grouping, multiple source rows collapse into one output row. Any column not in the GROUP BY key could have different values across those collapsed rows, making a single deterministic output value impossible without an aggregate function.