datarekha
Data Engineering Easy Asked at Kimball GroupAsked at dbt LabsAsked at SnowflakeAsked at Amazon

What is the difference between a star schema and a snowflake schema in dimensional modeling?

The short answer

A star schema has a central fact table joined directly to denormalized dimension tables — one join hop per dimension, simple queries, better query performance. A snowflake schema normalizes dimension tables into sub-dimensions, reducing storage redundancy but requiring more joins. Star schemas are preferred for analytics workloads; snowflake schemas are sometimes used when a dimension is very large and has many redundant attribute values.

How to think about it

Both schemas are dimensional models — they split data into facts (measurable events) and dimensions (descriptive attributes). The difference is how far dimensions are normalized.

Star schema

Every dimension is a single flat table. A query joining the fact table to dimensions touches at most N+1 tables (fact + N dimensions).

-- Star schema: one join per dimension
SELECT
    d.category,
    t.month,
    SUM(f.revenue) AS revenue
FROM fact_orders f
JOIN dim_product d ON f.product_key = d.product_key
JOIN dim_date   t ON f.date_key    = t.date_key
GROUP BY 1, 2;

dim_product contains all product attributes in one wide table: category, subcategory, brand, supplier_country. Some values repeat across rows, but columnar compression makes this storage cost negligible.

Snowflake schema

dim_product is split: a dim_brand table holds brand attributes, a dim_supplier table holds supplier attributes. dim_product holds foreign keys to both.

-- Snowflake schema: extra joins required
SELECT p.subcategory, b.brand_name
FROM fact_orders f
JOIN dim_product  p ON f.product_key  = p.product_key
JOIN dim_brand    b ON p.brand_key    = b.brand_key
JOIN dim_supplier s ON p.supplier_key = s.supplier_key;

When to choose snowflake: when a dimension has millions of rows and a repeated attribute (e.g., supplier country repeated for every product) would cause measurable storage bloat — rare in practice with columnar compression.

Practical guidance in 2026

Star schemas dominate. Modern columnar stores compress repeated strings extremely well (dictionary encoding), so the storage savings of normalization are minimal. Query simplicity and optimizer efficiency favor fewer joins. dbt’s dimensional modeling guide and the Kimball methodology both default to star.

Learn it properly Dimensional Modeling

Keep practising

All Data Engineering questions

Explore further

Skip to content