Lesson 48 • Advanced
Data Warehousing & Star Schemas
Design analytical data warehouses with star schemas, fact tables, dimensions, and slowly changing dimensions.
✅ What You'll Learn
- • Star schema design with facts and dimensions
- • Writing analytical queries against a warehouse
- • Slowly Changing Dimensions (SCD Types 1, 2, 3)
- • ETL loading from OLTP to warehouse
🏗️ What Is a Data Warehouse?
🎯 Real-World Analogy: Your OLTP database is like a supermarket checkout — fast, one customer at a time. A data warehouse is like the supermarket's headquarters, where analysts look at sales across all 500 stores over the past 5 years to spot trends and make decisions.
Data warehouses are optimised for reading huge volumes. They sacrifice write speed (data loads happen in batches) to deliver blazing-fast analytical queries across billions of rows.
⚠️ Common Mistake: Running analytical queries directly on your production OLTP database. This slows down your app for all users. Always separate transactional and analytical workloads.
Try It: Star Schema Design
Build a star schema with fact_sales and dimension tables
-- Star Schema: The foundation of data warehousing
-- Central fact table surrounded by dimension tables
-- Dimension: Date (pre-populated calendar table)
CREATE TABLE dim_date (
date_key INT PRIMARY KEY, -- YYYYMMDD format
full_date DATE NOT NULL,
year SMALLINT NOT NULL,
quarter SMALLINT NOT NULL,
month SMALLINT NOT NULL,
month_name VARCHAR(10) NOT NULL,
week SMALLINT NOT NULL,
day_of_week VARCHAR(10) NOT NULL,
is_weekend BOOLEAN NOT NULL,
is_holiday BOO
...Try It: Analytical Queries
Write revenue reports, YoY comparisons, and running totals
-- Warehouse analytical queries
-- Aggregate across dimensions for business insights
-- Revenue by category and month
SELECT
d.year, d.month_name,
p.category,
SUM(f.total_amount) AS revenue,
SUM(f.quantity) AS units_sold,
ROUND(AVG(f.total_amount), 2) AS avg_sale
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
WHERE d.year = 2024
GROUP BY d.year, d.month, d.month_name, p.category
ORDER BY d.month, revenue DESC;
...Try It: Slowly Changing Dimensions & ETL
Track dimension changes over time and load data from OLTP
-- Slowly Changing Dimensions (SCD)
-- Track how dimension attributes change over time
-- SCD Type 1: Overwrite (lose history)
UPDATE dim_product
SET category = 'Premium Electronics'
WHERE product_id = 1001 AND is_current = true;
-- SCD Type 2: Add new row (preserve history)
-- Step 1: Expire the current row
UPDATE dim_product
SET effective_to = CURRENT_DATE - 1, is_current = false
WHERE product_id = 1001 AND is_current = true;
-- Step 2: Insert the new version
INSERT INTO dim_product (produc
...📋 Quick Reference
| Concept | Description | Example |
|---|---|---|
| Fact Table | Measurable events | fact_sales, fact_clicks |
| Dimension | Descriptive context | dim_date, dim_product |
| Star Schema | Fact + dimensions | Simple, fast queries |
| Snowflake | Normalised dimensions | Less redundancy |
| SCD Type 2 | New row per change | Full history preserved |
🎉 Lesson Complete!
You can now design data warehouses! Next, learn about using SQL with big data tools like Hive, SparkSQL, and DuckDB.
Sign up for free to track which lessons you've completed and get learning reminders.