Star Schema
A star schema is a dimensional modeling pattern for data warehouses where a central fact table containing quantitative measures is surrounded by dimension tables providing descriptive context. The schema resembles a star when visualized — the fact table at the center with dimension tables radiating outward. Introduced by Ralph Kimball as the foundation of dimensional modeling.
Structure
erDiagram fact_sales ||--o{ dim_date : "date_key" fact_sales ||--o{ dim_product : "product_key" fact_sales ||--o{ dim_store : "store_key" fact_sales ||--o{ dim_customer : "customer_key" fact_sales { int date_key FK int product_key FK int store_key FK int customer_key FK decimal revenue int quantity decimal cost } dim_date { int date_key PK date full_date int year int quarter int month string weekday boolean is_holiday } dim_product { int product_key PK string name string category string subcategory string brand } dim_store { int store_key PK string name string city string state string country } dim_customer { int customer_key PK string name string segment string region string tier }
Fact Table
The central table containing:
- Measures — Numeric, additive values (revenue, quantity, cost, duration)
- Foreign keys — References to each dimension table
- Degenerate dimensions — Dimension attributes stored directly in the fact table (e.g., invoice number)
Fact types:
- Transaction facts — One row per event (each sale, each click)
- Periodic snapshot facts — One row per time period (daily balance, monthly inventory)
- Accumulating snapshot facts — One row per entity lifecycle (order tracking through stages)
Dimension Tables
Wide, denormalized tables providing the “who, what, where, when, why” context:
- dim_date — Calendar attributes (year, quarter, month, day, weekday, holiday flag)
- dim_product — Product hierarchy (category, subcategory, brand, name, SKU)
- dim_customer — Customer demographics (name, segment, region, tier)
- dim_store — Location details (city, state, country, format)
Dimensions are intentionally denormalized — redundancy trades storage for query performance by eliminating joins between normalized dimension tables.
Star Schema vs Snowflake Schema
| Aspect | Star Schema | Snowflake Schema |
|---|---|---|
| Dimension tables | Denormalized (flat) | Normalized (multiple tables) |
| Joins for a query | Fewer (fact + dimensions) | More (fact + dimension chains) |
| Query performance | Faster (fewer joins) | Slower (more joins) |
| Storage | More redundancy | Less redundancy |
| Complexity | Simpler to understand | More complex |
| ETL | Simpler loads | More complex transformations |
| OLAP tool support | Better (most tools assume star) | Good but more configuration |
Advantages
- Query performance — Simple joins between fact and dimension tables; easily optimized with indexes on foreign keys
- User-friendly — Business users understand the structure intuitively
- OLAP compatible — Natural fit for slice, dice, drill-down, roll-up operations
- Scalable — Add new dimensions without restructuring; add new facts by adding columns
- Aggregation-friendly — Pre-computed aggregate tables (summary tables) can accelerate common queries
Design Process
- Select the business process — What are you measuring? (sales, shipments, clicks)
- Declare the grain — What does one row in the fact table represent? (one transaction, one daily snapshot)
- Identify the dimensions — What context describes each fact? (date, product, customer, location)
- Identify the measures — What numeric values do you need? (revenue, quantity, discount)
Slowly Changing Dimensions (SCD)
Dimension attributes change over time. Strategies:
- Type 1 — Overwrite old value. No history preserved.
- Type 2 — Add new row with version tracking (start_date, end_date, is_current). Full history.
- Type 3 — Add column for previous value. Limited history (one level).