What is the difference between a star schema and a snowflake schema in dimensional modeling?
A star schema has a central fact table joined directly to denormalized dimension tables — one join hop per dimension, simple queries, better query performance. A snowflake schema normalizes dimension tables into sub-dimensions, reducing storage redundancy but requiring more joins. Star schemas are preferred for analytics workloads; snowflake schemas are sometimes used when a dimension is very large and has many redundant attribute values.
How to think about it
Both schemas are dimensional models — they split data into facts (measurable events) and dimensions (descriptive attributes). The difference is how far dimensions are normalized.
Star schema
Every dimension is a single flat table. A query joining the fact table to dimensions touches at most N+1 tables (fact + N dimensions).
-- Star schema: one join per dimension
SELECT
d.category,
t.month,
SUM(f.revenue) AS revenue
FROM fact_orders f
JOIN dim_product d ON f.product_key = d.product_key
JOIN dim_date t ON f.date_key = t.date_key
GROUP BY 1, 2;
dim_product contains all product attributes in one wide table: category, subcategory, brand, supplier_country. Some values repeat across rows, but columnar compression makes this storage cost negligible.
Snowflake schema
dim_product is split: a dim_brand table holds brand attributes, a dim_supplier table holds supplier attributes. dim_product holds foreign keys to both.
-- Snowflake schema: extra joins required
SELECT p.subcategory, b.brand_name
FROM fact_orders f
JOIN dim_product p ON f.product_key = p.product_key
JOIN dim_brand b ON p.brand_key = b.brand_key
JOIN dim_supplier s ON p.supplier_key = s.supplier_key;
When to choose snowflake: when a dimension has millions of rows and a repeated attribute (e.g., supplier country repeated for every product) would cause measurable storage bloat — rare in practice with columnar compression.
Practical guidance in 2026
Star schemas dominate. Modern columnar stores compress repeated strings extremely well (dictionary encoding), so the storage savings of normalization are minimal. Query simplicity and optimizer efficiency favor fewer joins. dbt’s dimensional modeling guide and the Kimball methodology both default to star.