What are 1NF, 2NF, and 3NF, and when would you intentionally denormalize?

1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.

How do you safely join two tables in a many-to-many relationship without creating a row explosion?

Many-to-many joins produce a Cartesian product of each matching subset, multiplying row counts exponentially. The correct approach is to pre-aggregate at least one side to a unique grain before joining, or to use a bridge/junction table that resolves the relationship into two one-to-many joins.

Should you normalize or denormalize tables in a data warehouse, and why?

Data warehouses favor denormalization — wide, flat tables that trade storage for query simplicity and performance. Normalization (splitting tables to eliminate redundancy) reduces storage but multiplies join hops, increasing query complexity and optimizer cost. In columnar warehouses with compression, the storage cost of redundancy is negligible, so denormalized star schemas consistently outperform normalized models for analytical workloads.

What is an anti-join, how do you implement one in SQL, and which implementation is most reliable?

An anti-join returns rows from the left table that have no matching row in the right table — the inverse of a semi-join. The three implementations are NOT EXISTS, NOT IN, and a LEFT JOIN with a NULL filter; NOT EXISTS is the most reliable because it is NULL-safe and communicates intent clearly.

Lossless-Join vs Dependency-Preservation — GATE DA

What you'll learn

Lossless-join: R1 ⋈ R2 recovers R exactly when R1 ∩ R2 is a superkey of one piece

Dependency-preservation: every FD can be checked on a single piece, no join needed

BCNF is always lossless but may lose dependencies; 3NF synthesis guarantees both

The GATE-favourite consequence — JOIN runs more often after a non-dependency-preserving split

Last lesson ended with the fix and its danger in the same breath. When a table trips on a bad FD, you split it into smaller, cleaner tables — but a careless cut can leave you unable to rebuild the original by joining, or unable to enforce a rule you used to enforce for free. So a split has to preserve something. Two somethings, it turns out, and they do not always travel together.

Picture tearing one wide spreadsheet into two narrower ones, then taping them back later by matching on a column they share. If that shared column pins down rows on at least one side, the tape is clean: every original row comes back, and nothing extra. But if the shared column is vague — several rows carry the same value in it — the tape smears. Matching on it glues row 3 of one sheet to row 8 of the other and invents a pairing that was never in the original table.

That smear is the danger we have to rule out. When you split a table to clean it up, you want a guarantee that taping it back gives you exactly the original — no rows lost, and, more sneakily, no rows invented. You would also like to keep enforcing every rule on the pieces alone, without taping first on every single update. Two guarantees, two names. The names sound heavy; the ideas are light.

Property 1 — lossless-join

Split R into R1 and R2, whose attribute lists overlap on some shared set. The decomposition is lossless-join when the natural join R1 ⋈ R2 recovers R exactly — not one spurious row more.

The GATE-level test fits on one line: the intersection R1 ∩ R2 must be a superkey of R1 or of R2, under the given FDs. If it is, the join is lossless; if it is not, the join leaks. Intuitively, the shared attributes are the tape, and they hold cleanly only when they uniquely identify rows on at least one side — so the join can never glue the wrong halves together.

Property 2 — dependency-preservation

Now take every FD that held on R and ask one question of each: can I check this on R1 alone, or on R2 alone, without ever joining them back? The decomposition is dependency-preserving when the answer is “yes” for every FD in the closure of F — equivalently, the FDs that survive on the separate pieces still regenerate the whole of the original F.

If an FD X → Y ends up with X in R1 but Y in R2, neither piece can see both columns at once. You could only check it by joining first. That FD has been lost — not from the data, but from the set you can cheaply enforce.

The two are independent

This is the part most students under-weigh. A decomposition can have either property, both, or neither — the two do not imply each other.

3NF always lands top-left; BCNF guarantees only the top row, sometimes top-right.

The BCNF trade-off

Here is the rule of thumb the exam leans on, and it falls straight out of that matrix:

BCNF decomposition — always lossless, but not always dependency-preserving.
3NF decomposition (via the synthesis algorithm) — always lossless and always dependency-preserving.

So when chasing BCNF would force you to scatter an FD across two fragments, the textbook move is to settle for 3NF and keep both properties. That single trade-off is why production schemas so often stop at 3NF — and it is exactly what the next worked example puts a price tag on.

How GATE asks this

MCQ — given a relation and its FDs, decide whether a proposed decomposition is lossless, dependency-preserving, both, or neither.
MCQ — the consequence question: if a decomposition is not dependency-preserving, which relational-algebra operator must run more often to enforce the original FDs? That is the 2025 question worked below.
MSQ — properties: BCNF versus 3NF, the superkey test for lossless join, when both can be had at once.

Worked example — GATE DA 2025, Q6

A relation R is split into pieces, and the decomposition is not dependency-preserving. To check the lost FDs, the system must reconstruct the original relation, which means computing a particular relational-algebra operation more often than before. Which operation? (A) σ selection (B) π projection (C) ⋈ join (D) ÷ division.

Trace what “not dependency-preserving” actually costs. There is some FD X → Y whose attributes X ∪ Y no longer all live in one piece — they straddle two fragments. The DBMS must still enforce that FD on every update, yet it cannot see both X and Y inside any single fragment. To even look at the data it needs, it has to glue the fragments back together — that is, take their join.

So JOIN (option C) is the operator that fires more often. The official 2025 answer is C. And this is the whole reason 3NF is preferred when BCNF would force a dependency-non-preserving split: 3NF lets the database enforce every FD locally, with no join tax on each update.

In one breath

A decomposition is lossless when joining the pieces rebuilds the original exactly — guaranteed precisely when the shared attributes R1 ∩ R2 form a superkey of one piece — and dependency-preserving when every FD can be checked on a single piece without a join. The two are independent: BCNF always buys you lossless but can scatter a dependency across fragments, while 3NF synthesis buys you both, which is why a non-dependency-preserving split makes the engine pay an extra JOIN on every update.

Practice

Quick check

0/6

Q1Recall — Which statements about decomposition are TRUE? (select all that apply)select all that apply

Q2Trace — R(A,B,C,D) with F = {A → B, B → C, C → D} is decomposed into R1(A,B) and R2(B,C,D). Is the decomposition lossless? Enter 1 for yes, 0 for no.numerical answer — type a number

Q3Trace — R(A,B,C) with F = {A → B, B → C} is split into R1(A,C) and R2(B,C). Is this decomposition lossless?

Q4Apply — A relation R is decomposed and the decomposition is NOT dependency-preserving. To enforce the lost FDs, which relational-algebra operator must run more often?

Q5Apply — Which conditions guarantee a lossless decomposition of R into R1, R2? (select all that apply)select all that apply

Q6Create — Orders(order_id, customer_id, customer_city) with FDs order_id → customer_id and customer_id → customer_city is split into Orders1(order_id, customer_id) and Customers(customer_id, customer_city) to stop repeating the city. Is this split lossless-join?

A question to carry forward

With this lesson the logical design of a schema is essentially done. You can normalize a table, split it safely, and prove that nothing is lost and no rule is dropped — every table well-shaped, every cut clean.

But a normalized schema is still only a picture on paper. The database has to store those rows somewhere physical and find any one of them quickly. When a query asks for the single row where id = 814302 in a table of ten million, what spares it from reading all ten million? Here is the thread onward: how does a database actually lay rows down on disk, and what small side-structure lets it leap straight to the row it wants instead of scanning the whole file?

Lossless-Join vs Dependency-Preservation

What you'll learn

Before you start