SQL Medium Asked at AmazonAsked at SnowflakeAsked at DatabricksAsked at dbt Labs

What is the difference between a star schema and a snowflake schema, and which should you choose?

For Data Engineer Data Analyst Data Scientist

The short answer

A star schema has a central fact table joined directly to denormalized dimension tables, giving simple two-table joins and fast query performance at the cost of some data redundancy. A snowflake schema normalizes dimensions into sub-dimension tables, reducing storage and update anomalies but requiring more joins that can slow analytical queries.

How to think about it

Both schemas are dimensional models built for OLAP queries. The only real difference is how much you normalise the dimension tables.

What is really being tested

The interviewer wants to see that you can reason about the trade-off between join complexity and data redundancy — and that you know why modern columnar warehouses (BigQuery, Redshift, Snowflake the product) strongly favour the star schema.

Star schema: flat dimensions, one join per dimension

Star schema: one join from fact to each dimension. Simple, fast, BI-tool friendly.

Each dimension table stores everything about that entity in a single flat row. dim_product has category_name and department_name directly — no need to join anywhere else.

Snowflake schema: normalised dimensions, more joins

In a snowflake schema dim_product no longer stores the category name directly. Instead it holds a foreign key to dim_category, which itself might reference dim_department. This looks like a snowflake branching out.

-- Snowflake: dim_product references dim_category (normalised)
CREATE TABLE dim_category (
  category_key    INT PRIMARY KEY,
  category_name   TEXT,
  department_name TEXT
);
CREATE TABLE dim_product (
  product_key  INT PRIMARY KEY,
  product_name TEXT,
  category_key INT REFERENCES dim_category(category_key)
);
-- To query product + category you now need two joins:
-- fact_sales → dim_product → dim_category

Head-to-head comparison

Dimension	Star	Snowflake
Dimension structure	Flat, denormalized	Normalized into sub-tables
Join count per query	Low (1 per dimension)	Higher (chains of joins)
Storage	More redundancy	Less redundancy
Query performance	Faster (fewer joins)	Slower for deep hierarchies
ETL complexity	Simpler	More referential integrity to maintain
Typical use	BI dashboards, ad-hoc OLAP	Very high-cardinality dimensions

Which should you choose?

Star schema is the default for modern columnar warehouses (BigQuery, Redshift, Snowflake the product). These engines are optimised for wide table scans across many columns; extra joins hurt more than denormalization does.

Snowflake schema makes sense when a dimension has millions of distinct values and a deep hierarchy that would bloat a flat table — or when your OLTP-style load process already maintains the normalised structure.

Hybrid is common in practice: flatten product and customer, but normalise geographic hierarchies or product category trees.

Learn it properly Dimensional Modeling