What is the difference between a star schema and a snowflake schema, and which should you choose?
A star schema has a central fact table joined directly to denormalized dimension tables, giving simple two-table joins and fast query performance at the cost of some data redundancy. A snowflake schema normalizes dimensions into sub-dimension tables, reducing storage and update anomalies but requiring more joins that can slow analytical queries.
How to think about it
Both schemas are dimensional models built for OLAP queries. The only real difference is how much you normalise the dimension tables.
What is really being tested
The interviewer wants to see that you can reason about the trade-off between join complexity and data redundancy — and that you know why modern columnar warehouses (BigQuery, Redshift, Snowflake the product) strongly favour the star schema.
Star schema: flat dimensions, one join per dimension
Each dimension table stores everything about that entity in a single flat row. dim_product has category_name and department_name directly — no need to join anywhere else.
Snowflake schema: normalised dimensions, more joins
In a snowflake schema dim_product no longer stores the category name directly. Instead it holds a foreign key to dim_category, which itself might reference dim_department. This looks like a snowflake branching out.
-- Snowflake: dim_product references dim_category (normalised)
CREATE TABLE dim_category (
category_key INT PRIMARY KEY,
category_name TEXT,
department_name TEXT
);
CREATE TABLE dim_product (
product_key INT PRIMARY KEY,
product_name TEXT,
category_key INT REFERENCES dim_category(category_key)
);
-- To query product + category you now need two joins:
-- fact_sales → dim_product → dim_category
Head-to-head comparison
| Dimension | Star | Snowflake |
|---|---|---|
| Dimension structure | Flat, denormalized | Normalized into sub-tables |
| Join count per query | Low (1 per dimension) | Higher (chains of joins) |
| Storage | More redundancy | Less redundancy |
| Query performance | Faster (fewer joins) | Slower for deep hierarchies |
| ETL complexity | Simpler | More referential integrity to maintain |
| Typical use | BI dashboards, ad-hoc OLAP | Very high-cardinality dimensions |
Which should you choose?
Star schema is the default for modern columnar warehouses (BigQuery, Redshift, Snowflake the product). These engines are optimised for wide table scans across many columns; extra joins hurt more than denormalization does.
Snowflake schema makes sense when a dimension has millions of distinct values and a deep hierarchy that would bloat a flat table — or when your OLTP-style load process already maintains the normalised structure.
Hybrid is common in practice: flatten product and customer, but normalise geographic hierarchies or product category trees.