datarekha

Lossless-Join vs Dependency-Preservation

Splitting a relation into two should not lose any rows when you join back — and should let you check every FD on the pieces. Two desirable properties, and they don't always come together.

8 min read Advanced GATE DA Lesson 72 of 122

What you'll learn

  • Lossless-join: R1 join R2 recovers R exactly when R1 ∩ R2 is a superkey of one piece
  • Dependency-preservation: every FD on R can be checked locally on a piece
  • BCNF is always lossless but may lose dependencies; 3NF guarantees both
  • The GATE-favourite consequence — JOIN runs more often after a non-dependency-preserving split

Before you start

You have a wide table that’s misbehaving (think repeated dept-head names every time a department appears). The natural fix is to split it into two narrower tables. Sounds easy. The catch: not every split is harmless. Some splits silently invent new rows when you join back. Others let an FD slip through the cracks so you can’t enforce it without first re-joining the pieces. Both failures cost real performance — and one of them showed up in GATE DA 2025.

Two properties decide whether the split is safe. Names that sound intimidating, ideas that are simple.

Property 1 — lossless-join

Split R into R1 and R2 (their attribute lists overlap on a common set). The decomposition is lossless-join when the natural join R1 ⋈ R2 recovers R exactly — no spurious extra rows.

The GATE-level test is one line: the intersection R1 ∩ R2 must be a superkey of R1 or of R2 (under the given FDs). If yes, the join is lossless; otherwise it leaks.

Intuitively the common attributes act like a glue that uniquely identifies rows on at least one side, so the join can’t combine the wrong halves.

Property 2 — dependency-preservation

Take every FD that held on R and ask: can I check it on R1 alone or R2 alone, without ever joining them back? The decomposition is dependency-preserving when the answer is “yes” for every FD in the closure of F — equivalently, the union of the FDs that survive on the pieces still generates the original F.

If an FD X → Y has X in R1 but Y in R2, you can’t check that FD on either piece by itself; you’d have to join. That FD has been “lost.”

The two are independent

This is the property students under-appreciate. A decomposition can have either, both, or neither.

Lossless?Dep-preserving?Yes / Yes3NF goalYes / Noclassic BCNFNo / Yesunsafe splitNo / Noworst case← Yes← NoLossless and dependency-preserving are independent — a split can land in any of the four cells.
3NF always lands top-left; BCNF guarantees only the top row, sometimes top-right.

The BCNF trade-off

Here’s the rule of thumb you’ll lean on in exam questions:

  • BCNF decomposition — always lossless, not always dependency-preserving.
  • 3NF decomposition (via the synthesis algorithm) — always lossless and always dependency-preserving.

That’s why production schemas often stop at 3NF. The cost of giving up dependency-preservation is exactly what the next worked example asks about.

How GATE asks this

  • MCQ — given a relation and FDs, decide whether a proposed decomposition is lossless / dependency-preserving / both / neither.
  • MCQ — the consequence question: if a decomposition is not dependency-preserving, which relational-algebra operator must be invoked more often to enforce the original FDs? This is the 2025 question below.
  • MSQ — properties: BCNF vs 3NF, the superkey test for lossless join, when both are possible.

Worked example — GATE DA 2025, Q6

A relation R is split into pieces and the decomposition is not dependency-preserving. To check the lost FDs, the system must reconstruct the original relation, which means it has to compute a particular relational-algebra operation more often than before. Which operation? (A) σ (selection) (B) π (projection) (C) ⋈ (join) (D) ÷ (division).

Think about what “not dependency-preserving” actually costs. There’s some FD X → Y where the attributes of X ∪ Y no longer all live in the same piece — they straddle two (or more) fragments. The DBMS still has to enforce that FD on every update; but it can’t peek at both X and Y in any single fragment. To even see the data needed, it has to glue the fragments back together — that is, take their join.

So JOIN (option C) is the operator that fires more often. (The official 2025 answer: C.)

This is the entire reason 3NF is often preferred when BCNF would force a dependency-non-preserving split: the database can enforce every FD locally without paying a join on every update.

Quick check

Quick check

0/6
Q1A relation R is decomposed into pieces and the decomposition is NOT dependency-preserving. To enforce the lost FDs, which relational-algebra operator must run more often?
Q2R(A,B,C,D) with F = {A → B, B → C, C → D} is decomposed into R1(A,B) and R2(B,C,D). Is the decomposition lossless? (NAT — answer 1 for yes, 0 for no.)numerical answer — type a number
Q3Which statements about decomposition are TRUE? (select all that apply)select all that apply
Q4R(A,B,C) with F = {A → B, B → C} is split into R1(A,C) and R2(B,C). Is this decomposition lossless?
Q5Which conditions guarantee a lossless decomposition of R into R1, R2? (select all that apply)select all that apply
Q6An Orders(order_id, customer_id, customer_city) table with FDs order_id → customer_id and customer_id → customer_city is split into Orders1(order_id, customer_id) and Customers(customer_id, customer_city) to remove the repeated city. Is this split lossless-join?

Practice this in an interview

All questions
What are 1NF, 2NF, and 3NF, and when would you intentionally denormalize?

1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.

How do you safely join two tables in a many-to-many relationship without creating a row explosion?

Many-to-many joins produce a Cartesian product of each matching subset, multiplying row counts exponentially. The correct approach is to pre-aggregate at least one side to a unique grain before joining, or to use a bridge/junction table that resolves the relationship into two one-to-many joins.

Should you normalize or denormalize tables in a data warehouse, and why?

Data warehouses favor denormalization — wide, flat tables that trade storage for query simplicity and performance. Normalization (splitting tables to eliminate redundancy) reduces storage but multiplies join hops, increasing query complexity and optimizer cost. In columnar warehouses with compression, the storage cost of redundancy is negligible, so denormalized star schemas consistently outperform normalized models for analytical workloads.

What is an anti-join, how do you implement one in SQL, and which implementation is most reliable?

An anti-join returns rows from the left table that have no matching row in the right table — the inverse of a semi-join. The three implementations are NOT EXISTS, NOT IN, and a LEFT JOIN with a NULL filter; NOT EXISTS is the most reliable because it is NULL-safe and communicates intent clearly.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content