What is the difference between a primary key and a foreign key, and what guarantees do they provide?

A primary key uniquely identifies each row in a table and implicitly creates a unique index; it cannot be NULL. A foreign key in a child table references the primary key of a parent table and enforces referential integrity — the database rejects inserts or updates that reference a non-existent parent row, and rejects parent deletes that would orphan child rows unless a cascade rule is defined.

How do you join tables on multiple keys, and why is the key order in a composite index important?

You combine conditions in the ON clause with AND to join on multiple columns, which is necessary when no single column is a unique identifier across both tables. For index performance, the most selective column — or the column used in equality predicates — should come first in a composite index.

What are 1NF, 2NF, and 3NF, and when would you intentionally denormalize?

1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.

What happens when a join key contains NULLs? Do NULL values ever match in a JOIN?

NULL never equals NULL in SQL — join conditions use equality, so rows where either key is NULL are silently excluded from INNER JOIN results and placed in the unmatched set for OUTER JOINs. If you need NULL-to-NULL matching, you must use IS NOT DISTINCT FROM or COALESCE the key to a sentinel value.

Keys & Integrity Constraints — GATE DA

What you'll learn

Superkey: any attribute set that uniquely identifies a tuple

Candidate key: a minimal superkey (remove anything and uniqueness breaks)

Primary key: one chosen candidate key; foreign key: references another table's PK

Entity integrity: primary key cannot be NULL · Referential integrity: foreign key must match or be NULL

The last lesson kept leaning on keys without ever defining them — time to pay that debt. How does a table guarantee “this row, not that one”? With a key — a small set of columns that no two rows ever share. And how do tables link to one another without lying? With a foreign key — a column that promises “whatever value I hold really exists over there.” Keys are the spine of every relational schema; once you can sort super → candidate → primary → foreign in your head, the integrity rules drop out for free. This is also why a botched key choice is the root cause behind half the “duplicate rows” and “orphaned record” bugs you will ever debug in a production data pipeline.

The four key types

Imagine a Students(id, email, name, age) table where both id and email are unique per student (no two students share either).

Superkey — any attribute set whose values are unique across all rows. {id}, {email}, {id, name}, {id, email, age} are all superkeys. So is the whole row. There are typically many.
Candidate key — a minimal superkey. Remove any one attribute and it stops being unique. {id} is minimal — drop id and you have nothing. {email} is minimal too. But {id, name} is not a candidate: dropping name still gives uniqueness, so it was not minimal.
Primary key — one chosen candidate key, designated by the designer. Conventionally {id}. The other candidate keys become alternate keys.
Foreign key — an attribute (or set) in one table that references the primary key of another. Enrolls.student_id references Students.id.

A foreign key promises: every value I hold must already exist as a primary key in the referenced table.

Two integrity rules enforce all this:

Entity integrity — the primary key cannot be NULL (and is unique by construction). Without a value you cannot identify the row.
Referential integrity — a foreign key value must either match an existing primary key in the referenced table, or be NULL. (NULL just means “this row is not linked yet.”)

How GATE asks this

Two patterns. First, MCQ on key types — given a relation and a set of functional dependencies (rules like A → B meaning “knowing A fixes B”), ask which subsets are candidate keys, or how many candidate keys exist. Second, MSQ on integrity rules — pick which of four statements about NULL, PK, and FK are correct. The questions reward precision: “superkey” vs “candidate key” hinges entirely on the word minimal.

Worked example

Take Students(id, email, name, age) where both id and email are unique per student.

Superkeys — {id}, {email}, {id, email}, {id, name}, {email, age}, …, all the way up to the whole row. Many.
Candidate keys — minimal superkeys. {id} is minimal (drop it and the remaining empty set is not unique). {email} is minimal for the same reason. {id, name} is not a candidate — dropping name still gives the unique {id}, so it was not minimal. So there are exactly two candidate keys: {id} and {email}.
Primary key — pick one. Convention says {id}. Then {email} becomes an alternate key.
Foreign key — if Enrolls(student_id, course_id, grade) exists, student_id is a foreign key referencing Students.id. Inserting a row with student_id = 999 when no student has id = 999 violates referential integrity.

A question to carry forward

You can now design tables, choose their keys, and guarantee — via entity and referential integrity — that the data inside never lies. But a database you can only fill and never question is a filing cabinet, not a tool. The whole point of storing rows is to be able to ask for exactly the ones you want: “the names of every student older than 20,” “which courses Asha is enrolled in.” Here is the thread onward: is there a precise, composable language for pulling chosen rows and columns out of a relation — one built from a handful of operators you can chain like arithmetic — that every SQL query is secretly translated into?

In one breath

Superkey = any attribute set unique across rows. Candidate key = a minimal superkey (drop anything and uniqueness breaks). Primary key = one chosen candidate key (others become alternate keys). Foreign key = references another table’s PK.
The whole hierarchy hinges on minimal: every candidate key is a superkey, but not vice versa.
Entity integrity: a primary key cannot be NULL. Referential integrity: a foreign key must match an existing PK or be NULL.
A NULL foreign key is legal (“not linked yet”); a foreign key pointing at a non-existent PK is the violation.
Find candidate keys from FDs: an attribute set is a key if its closure is every attribute and no proper subset already is.

Practice

Quick check

0/6

Q1Recall: which statements about keys are TRUE? (select all that apply)select all that apply

Q2Recall: which of the following are TRUE about integrity constraints? (select all that apply)select all that apply

Q3Trace: a table Employees(emp_id, ssn, name, dept_id) where both emp_id and ssn are unique per row. How many candidate keys does this table have? (integer)numerical answer — type a number

Q4Apply: relation R(A, B, C, D, E) has FDs A → BCDE and BC → ADE. How many candidate keys does R have? (integer)numerical answer — type a number

Q5Apply: in R(A, B, C, D) with FDs AB → CD and BC → AD, which is a candidate key?

Q6Create: consider Students(id PK) and Marks(student_id FK → Students.id, subject, score). Students currently has rows with id ∈ {1, 2, 3}. Which inserts into Marks would VIOLATE referential integrity? (select all that apply)select all that apply

Keys & Integrity Constraints

What you'll learn

Before you start

The four key types

How GATE asks this

Worked example

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further