datarekha

Finding Candidate Keys from FDs

Given a relation and its FDs, which combinations of attributes can uniquely identify a row? A short procedure does it every time — and GATE asks it almost every year.

7 min read Intermediate GATE DA Lesson 70 of 122

What you'll learn

  • A candidate key is a minimal attribute set whose closure is the full relation
  • Sort attributes into LHS-only, RHS-only, and both — the split that shortcuts the search
  • The procedure to find every candidate key from a set of FDs
  • Why LHS-only attributes are in every key and RHS-only ones are in none

Before you start

You have a table of students. Which columns, together, uniquely identify one student? roll_no alone, maybe. Or (email) on its own. Or (name, dob, city) if you really want a long key. Those minimal “uniquely-identifies-a-row” combinations are the candidate keys — and from a set of functional dependencies you can find them without guessing.

The trick is to look at where each attribute appears in the FDs, before computing any closures. (This is the same call you make every time you pick a primary key for a real table — the candidate keys are your honest list of options before you commit to one.)

The definition, plainly

A candidate key is a set of attributes K such that:

  1. K⁺ (its attribute closure) is the full set of attributes — it determines everything.
  2. No proper subset of K already determines everything — it’s minimal.

Condition 1 makes K a superkey. Condition 2 prunes it down to minimal superkey. There can be more than one candidate key, and a key with n attributes doesn’t imply some n-1-attribute key exists — minimality is checked subset-by-subset.

The LHS/RHS split — the shortcut

Look at every FD in F. Classify each attribute of R:

  • LHS-only — appears on some left-hand side, never on any right-hand side. Must be in every candidate key. (Nothing in the FDs produces it, so the only way to “have” it in your closure is to start with it.)
  • RHS-only — appears only on right-hand sides. Cannot be in any candidate key. (It’s always producible from something else, so including it can never be minimal.)
  • Both / neither — these are the optional candidates. You may need some of them.
LHS-onlyA, CMUST appear inevery candidate keyBoth sidesBOptional — try only ifLHS-only alone is not enoughRHS-onlyD, ECANNOT appear inany candidate keyExample split for R(A,B,C,D,E) under F = {A → B, BC → D, D → E}.
Three buckets. Two of them decide themselves; only the middle one needs searching.

The procedure

  1. Classify every attribute as LHS-only, RHS-only, or both.
  2. Let K = set of LHS-only attributes. Compute K⁺.
  3. If K⁺ = all attributes, K is the unique candidate key. Stop.
  4. Otherwise, try adding ONE attribute from the “both” bucket at a time. If any single addition makes the closure complete, that’s a candidate key. Repeat with pairs, triples, etc., always keeping minimality (skip any superset of an already-found key).

Most exam problems halt at step 3 or after a one-attribute extension.

How GATE asks this

  • NAT — “How many candidate keys does R have?” Run the procedure; count.
  • MCQ — “Which of the following is a candidate key of R?” Check each: is it a superkey? Is it minimal (drop each attribute, recompute closure)?
  • MSQ — properties of candidate keys (LHS-only inclusion, RHS-only exclusion).

Worked example

R(A, B, C, D, E) with F = {A → B, BC → D, D → E}. Find all candidate keys.

Step 1 — classify. Scan each attribute against the FDs.

  • A: LHS of A → B. Never on any RHS. → LHS-only.
  • C: LHS of BC → D. Never on any RHS. → LHS-only.
  • B: LHS of BC → D; RHS of A → B. → Both.
  • D: LHS of D → E; RHS of BC → D. → Both.
  • E: only on RHS of D → E. → RHS-only.

Step 2 — start with the must-include set. K = {A, C}.

Step 3 — compute K⁺.

  1. Start closure = {A, C}.
  2. A → B: A is in, add B{A, B, C}.
  3. BC → D: B, C both in, add D{A, B, C, D}.
  4. D → E: D is in, add E{A, B, C, D, E}. Done.

{A, C}⁺ covers everything, so {A, C} is a superkey. It contains both LHS-only attributes, which must be in every candidate key — so we can’t shrink it any further. Therefore {A, C} is the unique candidate key.

No need to try extensions: any other superkey would have to contain {A, C} (because LHS-only attributes are mandatory), and any such set is a superset of our found key — hence not minimal.

Quick check

Quick check

0/6
Q1R(A,B,C,D) with F = {A → B, B → C, C → D}. How many candidate keys does R have?numerical answer — type a number
Q2R(A,B,C,D,E) with F = {AB → C, C → D, D → E, E → A}. How many candidate keys?numerical answer — type a number
Q3Which statements about candidate keys are TRUE? (select all that apply)select all that apply
Q4R(A,B,C) with F = {AB → C, C → A}. Which sets are candidate keys? (select all that apply)select all that apply
Q5R(A,B,C,D) with F = {A → BC, D → A}. Identify the LHS-only attributes, then state |candidate key|.numerical answer — type a number
Q6An enrolment table Enrol(student, dept, course, grade) has FDs: student → dept; {student, course} → grade; dept → course. Which attribute is LHS-only, and so must appear in every candidate key?

Practice this in an interview

All questions
Find all unique combinations of candidates that sum to a target, where each candidate may be used an unlimited number of times.

Use backtracking with a running total. At each step, try adding a candidate to the current path. If the total equals the target, record the path. If it exceeds the target, prune. Passing the same start index (not i+1) back into the recursion allows unlimited reuse of the same element.

How do you join tables on multiple keys, and why is the key order in a composite index important?

You combine conditions in the ON clause with AND to join on multiple columns, which is necessary when no single column is a unique identifier across both tables. For index performance, the most selective column — or the column used in equality predicates — should come first in a composite index.

What are 1NF, 2NF, and 3NF, and when would you intentionally denormalize?

1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.

What is the difference between a primary key and a foreign key, and what guarantees do they provide?

A primary key uniquely identifies each row in a table and implicitly creates a unique index; it cannot be NULL. A foreign key in a child table references the primary key of a parent table and enforces referential integrity — the database rejects inserts or updates that reference a non-existent parent row, and rejects parent deletes that would orphan child rows unless a cascade rule is defined.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content