Snowflake Schema
A snowflake schema is a variation of the Star Schema where dimension tables are normalized into multiple related tables. The name comes from the branching pattern that resembles a snowflake when visualized.
erDiagram fact_sales ||--o{ dim_date : "date_key" fact_sales ||--o{ dim_product : "product_key" fact_sales ||--o{ dim_store : "store_key" dim_product ||--o{ dim_category : "category_id" dim_product ||--o{ dim_brand : "brand_id" dim_store ||--o{ dim_city : "city_id" dim_city ||--o{ dim_country : "country_id" fact_sales { int date_key FK int product_key FK int store_key FK decimal revenue int quantity } dim_product { int product_key PK string name int category_id FK int brand_id FK } dim_category { int category_id PK string category_name string department } dim_brand { int brand_id PK string brand_name } dim_store { int store_key PK string name int city_id FK } dim_city { int city_id PK string city_name int country_id FK } dim_country { int country_id PK string country_name string region }
Star vs Snowflake
| Aspect | Star Schema | Snowflake |
|---|---|---|
| Dimension tables | Denormalized (flat) | Normalized (branching) |
| Number of tables | Fewer | More |
| Joins per query | Fewer (faster) | More (slower) |
| Storage | More redundancy | Less redundancy |
| Query complexity | Simpler | More complex |
| ETL | Simpler loads | More complex transformations |
| Maintenance | Harder (update many rows) | Easier (update one row) |
When to Use Snowflake
- Dimension tables are very large and redundancy wastes significant storage
- Dimension attributes change frequently (normalized = update in one place)
- Strict data governance requires 3NF even in the warehouse
- The query engine handles the extra joins well (columnar databases like Snowflake, Redshift)
When to Prefer Star
- Query performance is the top priority
- Business users write ad-hoc queries (simpler to understand)
- OLAP tools expect star schemas
- Dimension tables are small (redundancy is negligible)