Should you normalize or denormalize tables in a data warehouse, and why?
Data warehouses favor denormalization — wide, flat tables that trade storage for query simplicity and performance. Normalization (splitting tables to eliminate redundancy) reduces storage but multiplies join hops, increasing query complexity and optimizer cost. In columnar warehouses with compression, the storage cost of redundancy is negligible, so denormalized star schemas consistently outperform normalized models for analytical workloads.
How to think about it
Normalization rules (1NF, 2NF, 3NF, BCNF) exist to eliminate update anomalies in transactional databases. In a warehouse where data is written once and read many times, most reasons for normalization evaporate.
Normalization rationale (OLTP context)
In an OLTP system, a customers table is normalized so that if a customer changes their address, you update one row in one table — not 10,000 rows in an orders table. Redundancy creates update anomalies; normalization prevents them.
-- Normalized (OLTP): customer address lives in one place
orders (order_id, customer_id, amount)
customers (customer_id, name, city)
Why warehouses denormalize
In a warehouse:
- Data is not updated in place (historically immutable fact records).
- Columnar compression eliminates the storage cost of redundancy (repeated
citystrings compress to a dictionary of unique values + an integer index). - Fewer joins means simpler queries, better optimizer plans, and easier self-service for analysts who should not need to know the schema.
-- Denormalized wide table (warehouse):
fact_orders (order_id, customer_id, customer_name, customer_city, amount, product_name, category)
-- Analyst can answer 90% of questions with no joins.
One-big-table (OBT) vs star schema
| Approach | Use when |
|---|---|
| Star schema | Multiple fact tables share the same dimensions; space/complexity balance needed |
| One-big-table | A single dominant fact with frequent self-serve BI; simplicity beats everything |
| Normalized (3NF) | Rarely in a warehouse; only for operational reporting directly off CDC-replicated data |
dbt modeling layers
A practical dbt project uses three layers that progress from normalized to denormalized:
- Staging — 1:1 with source, light cleaning, still normalized.
- Intermediate — business logic joins, narrow tables, still somewhat normalized.
- Marts — wide, denormalized, consumer-facing. Analysts query only this layer.
-- marts/orders_mart.sql
SELECT
o.order_id,
o.created_at,
c.name AS customer_name,
c.country,
p.category,
o.amount_usd
FROM {{ ref('stg_orders') }} o
JOIN {{ ref('stg_customers') }} c USING (customer_id)
JOIN {{ ref('stg_products') }} p USING (product_id)
The mart is a denormalized wide table built once by dbt; analysts query it without joins.