Keys & Integrity Constraints
Keys are how a table guarantees 'this row, not that one' — and how tables link to each other without lying. Super, candidate, primary, foreign: pick them apart once and it sticks.
What you'll learn
- Superkey: any attribute set that uniquely identifies a tuple
- Candidate key: a minimal superkey (remove anything and uniqueness breaks)
- Primary key: one chosen candidate key; foreign key: references another table's PK
- Entity integrity: primary key cannot be NULL · Referential integrity: foreign key must match or be NULL
Before you start
How does a table guarantee “this row, not that one”? With a key — a small set of columns that no two rows ever share. And how do tables link without lying about each other? With a foreign key — a column that promises “whatever I hold, it really exists over there.” Keys are the spine of every relational schema; once you can sort super → candidate → primary → foreign in your head, the integrity rules drop out for free. (This is also why a botched key choice is the root cause behind half the “duplicate rows” and “orphaned record” bugs you’ll ever debug in a production data pipeline.)
The four key types
Imagine a Students(id, email, name, age) table where both id and email are
unique per student (no two students share either).
- Superkey — any attribute set whose values are unique across all rows.
{id},{email},{id, name},{id, email, age}are all superkeys. So is the whole row. There are typically many. - Candidate key — a minimal superkey. Remove any one attribute and it
stops being unique.
{id}is minimal — dropidand you have nothing.{email}is minimal too. But{id, name}is not a candidate: droppingnamestill gives uniqueness, so it isn’t minimal. - Primary key — one chosen candidate key, designated by the designer.
Conventionally
{id}. The other candidate keys become alternate keys. - Foreign key — an attribute (or set) in one table that references the primary key of another. Enrolls.student_id references Students.id.
Two integrity rules enforce all this:
- Entity integrity — the primary key cannot be NULL (and is unique by construction). Without a value you can’t identify the row.
- Referential integrity — a foreign key value must either match an existing primary key in the referenced table, or be NULL. (NULL just means “this row isn’t linked yet.”)
How GATE asks this
Two patterns. First, MCQ on key types — given a relation and a set of
functional dependencies (rules like A → B meaning “knowing A fixes B”), ask
which subsets are candidate keys, or how many candidate keys exist. Second, MSQ on integrity rules — pick which of four
statements about NULL, PK, and FK are correct. The questions reward precision:
“superkey” vs “candidate key” hinges entirely on the word minimal.
Worked example
Take Students(id, email, name, age) where both id and email are unique per
student.
- Superkeys —
{id},{email},{id, email},{id, name},{email, age}, …, all the way up to the whole row. Many. - Candidate keys — minimal superkeys.
{id}is minimal (drop it and the remaining empty set isn’t unique).{email}is minimal for the same reason.{id, name}is not a candidate — droppingnamestill gives the unique{id}, so it wasn’t minimal. So there are exactly two candidate keys:{id}and{email}. - Primary key — pick one. Convention says
{id}. Then{email}becomes an alternate key. - Foreign key — if
Enrolls(student_id, course_id, grade)exists,student_idis a foreign key referencingStudents.id. Inserting a row withstudent_id = 999when no student hasid = 999violates referential integrity.
Quick check
Quick check
Practice this in an interview
All questionsA primary key uniquely identifies each row in a table and implicitly creates a unique index; it cannot be NULL. A foreign key in a child table references the primary key of a parent table and enforces referential integrity — the database rejects inserts or updates that reference a non-existent parent row, and rejects parent deletes that would orphan child rows unless a cascade rule is defined.
You combine conditions in the ON clause with AND to join on multiple columns, which is necessary when no single column is a unique identifier across both tables. For index performance, the most selective column — or the column used in equality predicates — should come first in a composite index.
1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.
NULL never equals NULL in SQL — join conditions use equality, so rows where either key is NULL are silently excluded from INNER JOIN results and placed in the unmatched set for OUTER JOINs. If you need NULL-to-NULL matching, you must use IS NOT DISTINCT FROM or COALESCE the key to a sentinel value.