datarekha
Data Engineering Easy Asked at SnowflakeAsked at BigQueryAsked at dbt LabsAsked at Databricks

Should you normalize or denormalize tables in a data warehouse, and why?

The short answer

Data warehouses favor denormalization — wide, flat tables that trade storage for query simplicity and performance. Normalization (splitting tables to eliminate redundancy) reduces storage but multiplies join hops, increasing query complexity and optimizer cost. In columnar warehouses with compression, the storage cost of redundancy is negligible, so denormalized star schemas consistently outperform normalized models for analytical workloads.

How to think about it

Normalization rules (1NF, 2NF, 3NF, BCNF) exist to eliminate update anomalies in transactional databases. In a warehouse where data is written once and read many times, most reasons for normalization evaporate.

Normalization rationale (OLTP context)

In an OLTP system, a customers table is normalized so that if a customer changes their address, you update one row in one table — not 10,000 rows in an orders table. Redundancy creates update anomalies; normalization prevents them.

-- Normalized (OLTP): customer address lives in one place
orders (order_id, customer_id, amount)
customers (customer_id, name, city)

Why warehouses denormalize

In a warehouse:

  1. Data is not updated in place (historically immutable fact records).
  2. Columnar compression eliminates the storage cost of redundancy (repeated city strings compress to a dictionary of unique values + an integer index).
  3. Fewer joins means simpler queries, better optimizer plans, and easier self-service for analysts who should not need to know the schema.
-- Denormalized wide table (warehouse):
fact_orders (order_id, customer_id, customer_name, customer_city, amount, product_name, category)
-- Analyst can answer 90% of questions with no joins.

One-big-table (OBT) vs star schema

ApproachUse when
Star schemaMultiple fact tables share the same dimensions; space/complexity balance needed
One-big-tableA single dominant fact with frequent self-serve BI; simplicity beats everything
Normalized (3NF)Rarely in a warehouse; only for operational reporting directly off CDC-replicated data

dbt modeling layers

A practical dbt project uses three layers that progress from normalized to denormalized:

  1. Staging — 1:1 with source, light cleaning, still normalized.
  2. Intermediate — business logic joins, narrow tables, still somewhat normalized.
  3. Marts — wide, denormalized, consumer-facing. Analysts query only this layer.
-- marts/orders_mart.sql
SELECT
    o.order_id,
    o.created_at,
    c.name   AS customer_name,
    c.country,
    p.category,
    o.amount_usd
FROM {{ ref('stg_orders') }}    o
JOIN {{ ref('stg_customers') }} c USING (customer_id)
JOIN {{ ref('stg_products') }}  p USING (product_id)

The mart is a denormalized wide table built once by dbt; analysts query it without joins.

Learn it properly Dimensional Modeling

Keep practising

All Data Engineering questions

Explore further

Skip to content