Normal Forms: 1NF to BCNF
Why does the same data design feel "clean" or "messy"? Normal forms give you four named bars to clear — and GATE asks the highest a relation satisfies almost every year.
What you'll learn
- 1NF, 2NF, 3NF, BCNF — what each rules out and how to test it
- Prime vs non-prime attributes — and why 3NF lets one exception slide
- The procedure to find the highest normal form a relation satisfies
- The strictness ladder: BCNF is the strongest, 1NF the weakest
Before you start
Imagine an “Orders” table where each row lists all the items the customer bought in one cell. Or a “Students” table where every row repeats the department’s head of department. Both store data — but they’re going to misbehave the moment you try to update them. Normal forms are the polite warnings the theory gives you: here’s the bar you’ve cleared; here’s the one you’ve tripped on.
Four bars, increasing in strictness. GATE almost always asks the same question: which is the highest one a given relation clears? Out in real data work it’s the same judgment call — how far to normalize a schema before the update anomalies (insert/update/delete bugs) outweigh the extra joins.
The four bars, plainly
- 1NF — atomic values. Every cell holds a single, indivisible value. No lists, no sets, no JSON. Assumed by default for the GATE syllabus.
- 2NF — no partial dependency. No non-prime attribute depends on only part of a candidate key. (If every candidate key is a single attribute, 2NF holds automatically.)
- 3NF — no transitive dependency. For every non-trivial FD
X → A, at least one of the following must hold:Xis a superkey, orAis a prime attribute (part of some candidate key). - BCNF — strictly superkey on the left. For every non-trivial FD
X → A,Xmust be a superkey. No exception.
Two terms you’ll reuse constantly: an attribute is prime if it belongs to some candidate key, and non-prime otherwise. 3NF gives a free pass to FDs whose right-hand side is prime; BCNF does not.
How they stack
A trivial FD (one where the RHS is a subset of the LHS, e.g. AB → A) is always allowed; the rules only police non-trivial FDs.
How GATE asks this
The most common framing: a small relation R(...) and a set of FDs are given, then “the highest normal form R satisfies is — ?” MSQ variants ask which named NF a relation is in, or to match each FD’s role (the offender vs the well-behaved). The recipe:
- Find every candidate key of
R. Identify the prime / non-prime attributes. - Check BCNF: every non-trivial FD’s LHS must be a superkey. One bad FD ⇒ BCNF fails.
- If BCNF failed, check 3NF: for each non-trivial FD that fails BCNF, is its RHS prime? If yes, 3NF still holds for that FD. If any FD has both a non-superkey LHS and a non-prime RHS, 3NF fails too.
- If 3NF failed, check 2NF: is there a non-prime attribute that depends on a strict subset of some candidate key? If yes, 2NF fails; the highest is 1NF.
Worked example
R(A, B, C, D)withF = {A → B, A → C, BC → D}. What is the highest normal formRsatisfies?
Find candidate keys. Compute A⁺: start {A}; A → B adds B; A → C adds C; now BC → D fires (both in) and adds D → A⁺ = ABCD. So {A} is a superkey. It’s minimal (single attribute). Any other candidate key? Check B⁺ = {B}, C⁺ = {C}, D⁺ = {D} — none reaches the full set. So the unique candidate key is {A}. Prime = {A}; non-prime = {B, C, D}.
Check BCNF. For each FD, is the LHS a superkey?
A → B: LHSAis the key — superkey ✓.A → C: same — ✓.BC → D: LHSBCis not a superkey (its closure isBCD, missingA) — ✗.
BCNF fails because of BC → D.
Check 3NF. The failing FD is BC → D. For 3NF to hold we need either BC a superkey (no) or D prime (no — D is non-prime). Both fail, so 3NF fails too.
Check 2NF. The only candidate key is {A}, a single attribute, so there can be no “partial dependency on part of a candidate key” — 2NF holds trivially. (And 1NF is assumed.)
Highest NF satisfied: 2NF. The trouble-maker is BC → D — BC isn’t a key, and D isn’t part of any key.
Quick check
Quick check
Practice this in an interview
All questions1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.
Data warehouses favor denormalization — wide, flat tables that trade storage for query simplicity and performance. Normalization (splitting tables to eliminate redundancy) reduces storage but multiplies join hops, increasing query complexity and optimizer cost. In columnar warehouses with compression, the storage cost of redundancy is negligible, so denormalized star schemas consistently outperform normalized models for analytical workloads.
A star schema has a central fact table joined directly to denormalized dimension tables, giving simple two-table joins and fast query performance at the cost of some data redundancy. A snowflake schema normalizes dimensions into sub-dimension tables, reducing storage and update anomalies but requiring more joins that can slow analytical queries.
A star schema has a central fact table joined directly to denormalized dimension tables — one join hop per dimension, simple queries, better query performance. A snowflake schema normalizes dimension tables into sub-dimensions, reducing storage redundancy but requiring more joins. Star schemas are preferred for analytics workloads; snowflake schemas are sometimes used when a dimension is very large and has many redundant attribute values.