Lossless-Join vs Dependency-Preservation
Splitting a relation into two should not lose any rows when you join back — and should let you check every FD on the pieces. Two desirable properties, and they don't always come together.
What you'll learn
- Lossless-join: R1 join R2 recovers R exactly when R1 ∩ R2 is a superkey of one piece
- Dependency-preservation: every FD on R can be checked locally on a piece
- BCNF is always lossless but may lose dependencies; 3NF guarantees both
- The GATE-favourite consequence — JOIN runs more often after a non-dependency-preserving split
Before you start
You have a wide table that’s misbehaving (think repeated dept-head names every time a department appears). The natural fix is to split it into two narrower tables. Sounds easy. The catch: not every split is harmless. Some splits silently invent new rows when you join back. Others let an FD slip through the cracks so you can’t enforce it without first re-joining the pieces. Both failures cost real performance — and one of them showed up in GATE DA 2025.
Two properties decide whether the split is safe. Names that sound intimidating, ideas that are simple.
Property 1 — lossless-join
Split R into R1 and R2 (their attribute lists overlap on a common set). The decomposition is lossless-join when the natural join R1 ⋈ R2 recovers R exactly — no spurious extra rows.
The GATE-level test is one line: the intersection R1 ∩ R2 must be a superkey of R1 or of R2 (under the given FDs). If yes, the join is lossless; otherwise it leaks.
Intuitively the common attributes act like a glue that uniquely identifies rows on at least one side, so the join can’t combine the wrong halves.
Property 2 — dependency-preservation
Take every FD that held on R and ask: can I check it on R1 alone or R2 alone, without ever joining them back? The decomposition is dependency-preserving when the answer is “yes” for every FD in the closure of F — equivalently, the union of the FDs that survive on the pieces still generates the original F.
If an FD X → Y has X in R1 but Y in R2, you can’t check that FD on either piece by itself; you’d have to join. That FD has been “lost.”
The two are independent
This is the property students under-appreciate. A decomposition can have either, both, or neither.
The BCNF trade-off
Here’s the rule of thumb you’ll lean on in exam questions:
- BCNF decomposition — always lossless, not always dependency-preserving.
- 3NF decomposition (via the synthesis algorithm) — always lossless and always dependency-preserving.
That’s why production schemas often stop at 3NF. The cost of giving up dependency-preservation is exactly what the next worked example asks about.
How GATE asks this
- MCQ — given a relation and FDs, decide whether a proposed decomposition is lossless / dependency-preserving / both / neither.
- MCQ — the consequence question: if a decomposition is not dependency-preserving, which relational-algebra operator must be invoked more often to enforce the original FDs? This is the 2025 question below.
- MSQ — properties: BCNF vs 3NF, the superkey test for lossless join, when both are possible.
Worked example — GATE DA 2025, Q6
A relation
Ris split into pieces and the decomposition is not dependency-preserving. To check the lost FDs, the system must reconstruct the original relation, which means it has to compute a particular relational-algebra operation more often than before. Which operation?(A) σ (selection) (B) π (projection) (C) ⋈ (join) (D) ÷ (division).
Think about what “not dependency-preserving” actually costs. There’s some FD X → Y where the attributes of X ∪ Y no longer all live in the same piece — they straddle two (or more) fragments. The DBMS still has to enforce that FD on every update; but it can’t peek at both X and Y in any single fragment. To even see the data needed, it has to glue the fragments back together — that is, take their join.
So JOIN (option C) is the operator that fires more often. (The official 2025 answer: C.)
This is the entire reason 3NF is often preferred when BCNF would force a dependency-non-preserving split: the database can enforce every FD locally without paying a join on every update.
Quick check
Quick check
Practice this in an interview
All questions1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.
Many-to-many joins produce a Cartesian product of each matching subset, multiplying row counts exponentially. The correct approach is to pre-aggregate at least one side to a unique grain before joining, or to use a bridge/junction table that resolves the relationship into two one-to-many joins.
Data warehouses favor denormalization — wide, flat tables that trade storage for query simplicity and performance. Normalization (splitting tables to eliminate redundancy) reduces storage but multiplies join hops, increasing query complexity and optimizer cost. In columnar warehouses with compression, the storage cost of redundancy is negligible, so denormalized star schemas consistently outperform normalized models for analytical workloads.
An anti-join returns rows from the left table that have no matching row in the right table — the inverse of a semi-join. The three implementations are NOT EXISTS, NOT IN, and a LEFT JOIN with a NULL filter; NOT EXISTS is the most reliable because it is NULL-safe and communicates intent clearly.