What are 1NF, 2NF, and 3NF, and when would you intentionally denormalize?

1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.

Should you normalize or denormalize tables in a data warehouse, and why?

Data warehouses favor denormalization — wide, flat tables that trade storage for query simplicity and performance. Normalization (splitting tables to eliminate redundancy) reduces storage but multiplies join hops, increasing query complexity and optimizer cost. In columnar warehouses with compression, the storage cost of redundancy is negligible, so denormalized star schemas consistently outperform normalized models for analytical workloads.

What is the difference between a star schema and a snowflake schema, and which should you choose?

A star schema has a central fact table joined directly to denormalized dimension tables, giving simple two-table joins and fast query performance at the cost of some data redundancy. A snowflake schema normalizes dimensions into sub-dimension tables, reducing storage and update anomalies but requiring more joins that can slow analytical queries.

What is the difference between a star schema and a snowflake schema in dimensional modeling?

A star schema has a central fact table joined directly to denormalized dimension tables — one join hop per dimension, simple queries, better query performance. A snowflake schema normalizes dimension tables into sub-dimensions, reducing storage redundancy but requiring more joins. Star schemas are preferred for analytics workloads; snowflake schemas are sometimes used when a dimension is very large and has many redundant attribute values.

Normal Forms: 1NF to BCNF — GATE DA

We left the last lesson holding a clean split: every attribute is now prime (it sits in some candidate key) or non-prime (it sits in none). The promise was that this split does real work. Here is the work — it is exactly the tool that tells a clean table design from a messy one.

So picture two tables. In the first, an “Orders” row lists all the items a customer bought, jammed into a single cell. In the second, a “Students” row repeats the department’s head of department on every line, once per student. Both store the data you asked for. Both also misbehave the moment you try to change anything — edit the head of department and you must hunt down a hundred rows; delete the last student in a department and the department’s details vanish with them.

Those misbehaviours have a name, update anomalies, and they are not bad luck. They are baked into the shape of the table. What we want is a way to read the shape and say, plainly, here is the bar this design has cleared, and here is the one it has tripped on. That is what a normal form is — a named bar. GATE almost always asks the same thing: which is the highest bar a given relation clears? Out in real data work it is the same judgement, only the cost is yours to weigh — how far to normalize before the extra joins hurt more than the anomalies did.

The four bars, plainly

Four bars, each stricter than the last.

1NF — atomic values. Every cell holds a single, indivisible value. No lists, no sets, no JSON blob standing in for three facts. The GATE syllabus assumes this by default.
2NF — no partial dependency. No non-prime attribute depends on only part of a candidate key. If every candidate key is a single attribute, there is no “part” to depend on, so 2NF holds for free.
3NF — no transitive dependency. For every non-trivial FD X → A, at least one of two things must hold: X is a superkey, or A is a prime attribute.
BCNF — superkey on the left, always. For every non-trivial FD X → A, X must be a superkey. No exception, no escape clause.

Notice where the prime/non-prime split from last lesson lands: it is the whole hinge of 3NF. An attribute is prime if it belongs to some candidate key and non-prime otherwise; 3NF waves through any FD whose right-hand side is prime, and BCNF refuses to. That one difference is the entire gap between the two strongest bars.

How they stack

Each inner box is stricter; clearing BCNF means you have cleared the three outer ones too.

The bars nest. Clear the innermost and you have automatically cleared every one outside it, which is why we hunt from the inside out — test the strictest first, and step outward only when it fails. One more thing the rules quietly assume: a trivial FD, where the RHS is already a subset of the LHS (such as AB → A), is always allowed. The bars only ever police non-trivial FDs.

How GATE asks this

The usual framing is spare: a small relation R(...), a set of FDs, and the prompt “the highest normal form R satisfies is — ?”. MSQ variants ask which named NF a relation is in, or to sort each FD into its role — the offender versus the well-behaved. The recipe runs strictest-first:

Find every candidate key of R. Label each attribute prime or non-prime.
Check BCNF. Every non-trivial FD’s LHS must be a superkey. One bad FD and BCNF fails.
If BCNF failed, check 3NF. For each FD that broke BCNF, ask the escape question: is its RHS prime? If yes, 3NF survives that FD. If even one FD has both a non-superkey LHS and a non-prime RHS, 3NF fails too.
If 3NF failed, check 2NF. Is there a non-prime attribute depending on a strict subset of some candidate key? If yes, 2NF fails, and the highest the table clears is 1NF.

Worked example

R(A, B, C, D) with F = {A → B, A → C, BC → D}. What is the highest normal form R satisfies?

Find the candidate keys. Compute A⁺: start {A}; A → B adds B; A → C adds C; now BC → D fires, both in, and adds D → A⁺ = ABCD. So {A} is a superkey, and being a single attribute it is minimal. Any other key? Check B⁺ = {B}, C⁺ = {C}, D⁺ = {D} — none reaches the full set. So the unique candidate key is {A}. Prime = {A}; non-prime = {B, C, D}.

Check BCNF. For each FD, is the LHS a superkey?

A → B: LHS A is the key — superkey ✓.
A → C: same — ✓.
BC → D: LHS BC is not a superkey (its closure is BCD, missing A) — ✗.

BCNF fails, on account of BC → D.

Check 3NF. The failing FD is BC → D. For 3NF we need either BC a superkey (no) or D prime (no — D is non-prime). Both escape routes are shut, so 3NF fails too.

Check 2NF. The only candidate key is {A}, a single attribute. A partial dependency needs a non-prime attribute leaning on part of a key, and a one-attribute key has no proper part. So 2NF holds trivially, and 1NF is assumed.

Highest NF satisfied: 2NF. The trouble-maker is BC → D — BC is not a key, and D belongs to no key.

In one breath

Normal forms are four nested bars on how a table is shaped: 1NF wants atomic cells, 2NF forbids a non-prime attribute leaning on part of a key, 3NF forbids a non-superkey-to-non-prime dependency, and BCNF demands a superkey on the left of every non-trivial FD. Test strictest-first — find the keys, label prime and non-prime, then check BCNF, fall to 3NF on its prime-RHS escape clause, and fall to 2NF only if a partial dependency bites.

Practice

Quick check

0/6

Q1Recall — Which statements about normal forms are TRUE? (select all that apply)select all that apply

Q2Trace — R(A,B,C,D) with F = {A → B, A → C, BC → D}. Enter the number of FDs that violate BCNF.numerical answer — type a number

Q3Trace — R(A,B,C,D) with F = {AB → C, C → D, D → A}. What is the highest normal form R satisfies?

Q4Apply — R(A,B,C) with F = {AB → C, C → B}. Find the candidate keys, then the highest NF.

Q5Apply — Which statements about prime and non-prime attributes are TRUE? (select all that apply)select all that apply

Q6Create — A table Enroll(student_id, course_id, course_name) has FD course_id → course_name, and the only candidate key is {student_id, course_id}. What is the highest normal form it satisfies?

A question to carry forward

So now you can name the highest bar a table clears — and, just as often, watch it trip on a single bad FD like BC → D. Naming the failure is satisfying. But a database designer cannot stop at the diagnosis; the table still has to be fixed.

The obvious fix is to break the offending table into smaller ones, each clean. Yet splitting is dangerous: cut a table the wrong way and you can no longer rebuild the original by joining the pieces, or you quietly lose the ability to enforce one of the FDs. Here is the thread onward: when you decompose a relation to reach a higher normal form, what must the split preserve so that nothing is lost — and can you always have both at once?

Normal Forms: 1NF to BCNF

What you'll learn

Before you start

The four bars, plainly

How they stack

How GATE asks this

Worked example

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further