Snowflake Schema

A snowflake schema is a variation of the Star Schema where dimension tables are normalized into multiple related tables. The name comes from the branching pattern that resembles a snowflake when visualized.

erDiagram
    fact_sales ||--o{ dim_date : "date_key"
    fact_sales ||--o{ dim_product : "product_key"
    fact_sales ||--o{ dim_store : "store_key"
    dim_product ||--o{ dim_category : "category_id"
    dim_product ||--o{ dim_brand : "brand_id"
    dim_store ||--o{ dim_city : "city_id"
    dim_city ||--o{ dim_country : "country_id"

    fact_sales {
        int date_key FK
        int product_key FK
        int store_key FK
        decimal revenue
        int quantity
    }

    dim_product {
        int product_key PK
        string name
        int category_id FK
        int brand_id FK
    }

    dim_category {
        int category_id PK
        string category_name
        string department
    }

    dim_brand {
        int brand_id PK
        string brand_name
    }

    dim_store {
        int store_key PK
        string name
        int city_id FK
    }

    dim_city {
        int city_id PK
        string city_name
        int country_id FK
    }

    dim_country {
        int country_id PK
        string country_name
        string region
    }

Star vs Snowflake

AspectStar SchemaSnowflake
Dimension tablesDenormalized (flat)Normalized (branching)
Number of tablesFewerMore
Joins per queryFewer (faster)More (slower)
StorageMore redundancyLess redundancy
Query complexitySimplerMore complex
ETLSimpler loadsMore complex transformations
MaintenanceHarder (update many rows)Easier (update one row)

When to Use Snowflake

  • Dimension tables are very large and redundancy wastes significant storage
  • Dimension attributes change frequently (normalized = update in one place)
  • Strict data governance requires 3NF even in the warehouse
  • The query engine handles the extra joins well (columnar databases like Snowflake, Redshift)

When to Prefer Star

  • Query performance is the top priority
  • Business users write ad-hoc queries (simpler to understand)
  • OLAP tools expect star schemas
  • Dimension tables are small (redundancy is negligible)