Finding Candidate Keys from FDs
Given a relation and its FDs, which combinations of attributes can uniquely identify a row? A short procedure does it every time — and GATE asks it almost every year.
What you'll learn
- A candidate key is a minimal attribute set whose closure is the full relation
- Sort attributes into LHS-only, RHS-only, and both — the split that shortcuts the search
- The procedure to find every candidate key from a set of FDs
- Why LHS-only attributes are in every key and RHS-only ones are in none
Before you start
You have a table of students. Which columns, together, uniquely identify one student? roll_no alone, maybe. Or (email) on its own. Or (name, dob, city) if you really want a long key. Those minimal “uniquely-identifies-a-row” combinations are the candidate keys — and from a set of functional dependencies you can find them without guessing.
The trick is to look at where each attribute appears in the FDs, before computing any closures. (This is the same call you make every time you pick a primary key for a real table — the candidate keys are your honest list of options before you commit to one.)
The definition, plainly
A candidate key is a set of attributes K such that:
K⁺(its attribute closure) is the full set of attributes — it determines everything.- No proper subset of
Kalready determines everything — it’s minimal.
Condition 1 makes K a superkey. Condition 2 prunes it down to minimal superkey. There can be more than one candidate key, and a key with n attributes doesn’t imply some n-1-attribute key exists — minimality is checked subset-by-subset.
The LHS/RHS split — the shortcut
Look at every FD in F. Classify each attribute of R:
- LHS-only — appears on some left-hand side, never on any right-hand side. Must be in every candidate key. (Nothing in the FDs produces it, so the only way to “have” it in your closure is to start with it.)
- RHS-only — appears only on right-hand sides. Cannot be in any candidate key. (It’s always producible from something else, so including it can never be minimal.)
- Both / neither — these are the optional candidates. You may need some of them.
The procedure
- Classify every attribute as LHS-only, RHS-only, or both.
- Let
K =set of LHS-only attributes. ComputeK⁺. - If
K⁺= all attributes,Kis the unique candidate key. Stop. - Otherwise, try adding ONE attribute from the “both” bucket at a time. If any single addition makes the closure complete, that’s a candidate key. Repeat with pairs, triples, etc., always keeping minimality (skip any superset of an already-found key).
Most exam problems halt at step 3 or after a one-attribute extension.
How GATE asks this
- NAT — “How many candidate keys does
Rhave?” Run the procedure; count. - MCQ — “Which of the following is a candidate key of
R?” Check each: is it a superkey? Is it minimal (drop each attribute, recompute closure)? - MSQ — properties of candidate keys (LHS-only inclusion, RHS-only exclusion).
Worked example
R(A, B, C, D, E)withF = {A → B, BC → D, D → E}. Find all candidate keys.
Step 1 — classify. Scan each attribute against the FDs.
A: LHS ofA → B. Never on any RHS. → LHS-only.C: LHS ofBC → D. Never on any RHS. → LHS-only.B: LHS ofBC → D; RHS ofA → B. → Both.D: LHS ofD → E; RHS ofBC → D. → Both.E: only on RHS ofD → E. → RHS-only.
Step 2 — start with the must-include set. K = {A, C}.
Step 3 — compute K⁺.
- Start
closure = {A, C}. A → B:Ais in, addB→{A, B, C}.BC → D:B, Cboth in, addD→{A, B, C, D}.D → E:Dis in, addE→{A, B, C, D, E}. Done.
{A, C}⁺ covers everything, so {A, C} is a superkey. It contains both LHS-only attributes, which must be in every candidate key — so we can’t shrink it any further. Therefore {A, C} is the unique candidate key.
No need to try extensions: any other superkey would have to contain {A, C} (because LHS-only attributes are mandatory), and any such set is a superset of our found key — hence not minimal.
Quick check
Quick check
Practice this in an interview
All questionsUse backtracking with a running total. At each step, try adding a candidate to the current path. If the total equals the target, record the path. If it exceeds the target, prune. Passing the same start index (not i+1) back into the recursion allows unlimited reuse of the same element.
You combine conditions in the ON clause with AND to join on multiple columns, which is necessary when no single column is a unique identifier across both tables. For index performance, the most selective column — or the column used in equality predicates — should come first in a composite index.
1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.
A primary key uniquely identifies each row in a table and implicitly creates a unique index; it cannot be NULL. A foreign key in a child table references the primary key of a parent table and enforces referential integrity — the database rejects inserts or updates that reference a non-existent parent row, and rejects parent deletes that would orphan child rows unless a cascade rule is defined.