Star Schema

A star schema is a dimensional modeling pattern for data warehouses where a central fact table containing quantitative measures is surrounded by dimension tables providing descriptive context. The schema resembles a star when visualized — the fact table at the center with dimension tables radiating outward. Introduced by Ralph Kimball as the foundation of dimensional modeling.

Structure

erDiagram
    fact_sales ||--o{ dim_date : "date_key"
    fact_sales ||--o{ dim_product : "product_key"
    fact_sales ||--o{ dim_store : "store_key"
    fact_sales ||--o{ dim_customer : "customer_key"

    fact_sales {
        int date_key FK
        int product_key FK
        int store_key FK
        int customer_key FK
        decimal revenue
        int quantity
        decimal cost
    }

    dim_date {
        int date_key PK
        date full_date
        int year
        int quarter
        int month
        string weekday
        boolean is_holiday
    }

    dim_product {
        int product_key PK
        string name
        string category
        string subcategory
        string brand
    }

    dim_store {
        int store_key PK
        string name
        string city
        string state
        string country
    }

    dim_customer {
        int customer_key PK
        string name
        string segment
        string region
        string tier
    }

Fact Table

The central table containing:

  • Measures — Numeric, additive values (revenue, quantity, cost, duration)
  • Foreign keys — References to each dimension table
  • Degenerate dimensions — Dimension attributes stored directly in the fact table (e.g., invoice number)

Fact types:

  • Transaction facts — One row per event (each sale, each click)
  • Periodic snapshot facts — One row per time period (daily balance, monthly inventory)
  • Accumulating snapshot facts — One row per entity lifecycle (order tracking through stages)

Dimension Tables

Wide, denormalized tables providing the “who, what, where, when, why” context:

  • dim_date — Calendar attributes (year, quarter, month, day, weekday, holiday flag)
  • dim_product — Product hierarchy (category, subcategory, brand, name, SKU)
  • dim_customer — Customer demographics (name, segment, region, tier)
  • dim_store — Location details (city, state, country, format)

Dimensions are intentionally denormalized — redundancy trades storage for query performance by eliminating joins between normalized dimension tables.

Star Schema vs Snowflake Schema

AspectStar SchemaSnowflake Schema
Dimension tablesDenormalized (flat)Normalized (multiple tables)
Joins for a queryFewer (fact + dimensions)More (fact + dimension chains)
Query performanceFaster (fewer joins)Slower (more joins)
StorageMore redundancyLess redundancy
ComplexitySimpler to understandMore complex
ETLSimpler loadsMore complex transformations
OLAP tool supportBetter (most tools assume star)Good but more configuration

Advantages

  • Query performance — Simple joins between fact and dimension tables; easily optimized with indexes on foreign keys
  • User-friendly — Business users understand the structure intuitively
  • OLAP compatible — Natural fit for slice, dice, drill-down, roll-up operations
  • Scalable — Add new dimensions without restructuring; add new facts by adding columns
  • Aggregation-friendly — Pre-computed aggregate tables (summary tables) can accelerate common queries

Design Process

  1. Select the business process — What are you measuring? (sales, shipments, clicks)
  2. Declare the grain — What does one row in the fact table represent? (one transaction, one daily snapshot)
  3. Identify the dimensions — What context describes each fact? (date, product, customer, location)
  4. Identify the measures — What numeric values do you need? (revenue, quantity, discount)

Slowly Changing Dimensions (SCD)

Dimension attributes change over time. Strategies:

  • Type 1 — Overwrite old value. No history preserved.
  • Type 2 — Add new row with version tracking (start_date, end_date, is_current). Full history.
  • Type 3 — Add column for previous value. Limited history (one level).

Sources