datarekha

Keys & Integrity Constraints

Keys are how a table guarantees 'this row, not that one' — and how tables link to each other without lying. Super, candidate, primary, foreign: pick them apart once and it sticks.

6 min read Beginner GATE DA Lesson 64 of 122

What you'll learn

  • Superkey: any attribute set that uniquely identifies a tuple
  • Candidate key: a minimal superkey (remove anything and uniqueness breaks)
  • Primary key: one chosen candidate key; foreign key: references another table's PK
  • Entity integrity: primary key cannot be NULL · Referential integrity: foreign key must match or be NULL

Before you start

How does a table guarantee “this row, not that one”? With a key — a small set of columns that no two rows ever share. And how do tables link without lying about each other? With a foreign key — a column that promises “whatever I hold, it really exists over there.” Keys are the spine of every relational schema; once you can sort super → candidate → primary → foreign in your head, the integrity rules drop out for free. (This is also why a botched key choice is the root cause behind half the “duplicate rows” and “orphaned record” bugs you’ll ever debug in a production data pipeline.)

The four key types

Imagine a Students(id, email, name, age) table where both id and email are unique per student (no two students share either).

  • Superkeyany attribute set whose values are unique across all rows. {id}, {email}, {id, name}, {id, email, age} are all superkeys. So is the whole row. There are typically many.
  • Candidate key — a minimal superkey. Remove any one attribute and it stops being unique. {id} is minimal — drop id and you have nothing. {email} is minimal too. But {id, name} is not a candidate: dropping name still gives uniqueness, so it isn’t minimal.
  • Primary key — one chosen candidate key, designated by the designer. Conventionally {id}. The other candidate keys become alternate keys.
  • Foreign key — an attribute (or set) in one table that references the primary key of another. Enrolls.student_id references Students.id.
StudentsidPKemailalt keynameageEnrollsstudent_idFKcourse_idFKgradeEnrolls.student_id (FK) → Students.id (PK)
A foreign key promises: every value I hold must already exist as a primary key in the referenced table.

Two integrity rules enforce all this:

  • Entity integrity — the primary key cannot be NULL (and is unique by construction). Without a value you can’t identify the row.
  • Referential integrity — a foreign key value must either match an existing primary key in the referenced table, or be NULL. (NULL just means “this row isn’t linked yet.”)

How GATE asks this

Two patterns. First, MCQ on key types — given a relation and a set of functional dependencies (rules like A → B meaning “knowing A fixes B”), ask which subsets are candidate keys, or how many candidate keys exist. Second, MSQ on integrity rules — pick which of four statements about NULL, PK, and FK are correct. The questions reward precision: “superkey” vs “candidate key” hinges entirely on the word minimal.

Worked example

Take Students(id, email, name, age) where both id and email are unique per student.

  • Superkeys{id}, {email}, {id, email}, {id, name}, {email, age}, …, all the way up to the whole row. Many.
  • Candidate keys — minimal superkeys. {id} is minimal (drop it and the remaining empty set isn’t unique). {email} is minimal for the same reason. {id, name} is not a candidate — dropping name still gives the unique {id}, so it wasn’t minimal. So there are exactly two candidate keys: {id} and {email}.
  • Primary key — pick one. Convention says {id}. Then {email} becomes an alternate key.
  • Foreign key — if Enrolls(student_id, course_id, grade) exists, student_id is a foreign key referencing Students.id. Inserting a row with student_id = 999 when no student has id = 999 violates referential integrity.

Quick check

Quick check

0/6
Q1Relation R(A, B, C, D, E) has FDs: A → BCDE and BC → ADE. How many candidate keys does R have?numerical answer — type a number
Q2A table Employees(emp_id, ssn, name, dept_id) where both emp_id and ssn are unique per row. How many candidate keys does this table have?numerical answer — type a number
Q3Which statements about keys are TRUE? (select all that apply)select all that apply
Q4Which of the following are TRUE about integrity constraints? (select all that apply)select all that apply
Q5In R(A, B, C, D) with FDs AB → CD and BC → AD, which is a candidate key?
Q6Consider Students(id PK) and Marks(student_id FK → Students.id, subject, score). The Students table currently has rows with id ∈ {1, 2, 3}. Which inserts into Marks would VIOLATE referential integrity? (select all that apply)select all that apply

Practice this in an interview

All questions
What is the difference between a primary key and a foreign key, and what guarantees do they provide?

A primary key uniquely identifies each row in a table and implicitly creates a unique index; it cannot be NULL. A foreign key in a child table references the primary key of a parent table and enforces referential integrity — the database rejects inserts or updates that reference a non-existent parent row, and rejects parent deletes that would orphan child rows unless a cascade rule is defined.

How do you join tables on multiple keys, and why is the key order in a composite index important?

You combine conditions in the ON clause with AND to join on multiple columns, which is necessary when no single column is a unique identifier across both tables. For index performance, the most selective column — or the column used in equality predicates — should come first in a composite index.

What are 1NF, 2NF, and 3NF, and when would you intentionally denormalize?

1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.

What happens when a join key contains NULLs? Do NULL values ever match in a JOIN?

NULL never equals NULL in SQL — join conditions use equality, so rows where either key is NULL are silently excluded from INNER JOIN results and placed in the unmatched set for OUTER JOINs. If you need NULL-to-NULL matching, you must use IS NOT DISTINCT FROM or COALESCE the key to a sentinel value.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content