Lessons

Data Modelling

Dimensional modelling — SCD types and the star schema

Model dimensions with the right slowly-changing-dimension (SCD) type, build star schemas on Delta, and understand informational constraints.

A customer moves house; a product changes category; an employee changes department. A dimension attribute has changed — and the one decision that defines dimensional modelling is: do you keep the old value, or overwrite it? That policy has a name and a number — the SCD type — the most tested idea in this section.


The spine

Beat 1 — the anchor: the SCD number is the history policy

Predict: a customer moves house. Depending on the business need, name the three things you might do with the old address.

Keep it forever, throw it away, or never allow it to change at all — and each is a numbered policy:

Anchor. A Slowly Changing Dimension (SCD) type describes what you do when a dimension attribute changes; the number encodes how much history you keep. Type 0 = never change; Type 1 = overwrite (no history); Type 2 = keep every version (full history). Pick the type from the question's history requirement.

Beat 2 — the three you must know

One line: overwrite, latest only1; new row per change, old row closed2; never changes0.

Lock it. SCD type = history policy: 0 never / 1 overwrite / 2 versioned. Implement 1 & 2 with MERGE (1 updates in place; 2 closes current + inserts new).


The dials (skim now; return when a question needs one)

◆ Time travel is NOT SCD 2

A set trap: Delta time travel looks like free history, but it's for short-term recovery/audit, not long-term dimensional history — old versions get vacuumed ([How Delta Lake works — the transaction log](/lessons/f2-delta-transaction-log/)) and querying deep history doesn't scale in cost/latency. For durable, auditable dimension history, build SCD Type 2, don't lean on time travel. Tell: "keep long-term history/audit of changes" → SCD 2, not time travel.

◆ The star schema, and informational constraints

Dimensional models are star schemas: a central fact table (measurements/events) surrounded by dimension tables (descriptive context), joined by keys. Migrating from a relational DB onto Delta, mind two things:

◆ The date dimension — pre-compute the descriptors

A well-built date dimension pre-computes descriptive attributes — quarter, month_name, fiscal_period, day_of_week, is_holiday — so a query grouping "by quarter/fiscal period" just joins and reads instead of recomputing date math on the fact table each time. Do the descriptor work once, in the dimension.

Takeaways (rebuild it from these)

  1. The SCD type is the history policy: 0 = never change, 1 = overwrite (latest only), 2 = new row per change + close the old (full history).
  2. Implement Type 1/2 with MERGE ([APPLY CHANGES — CDC and SCD, declaratively](/lessons/s1-apply-changes-cdc/) does it declaratively): Type 1 updates in place; Type 2 closes the current row and inserts the new.
  3. Time travel ≠ SCD 2 — short-term recovery/audit (versions get vacuumed); use SCD 2 for durable history.
  4. Star schema on Delta: ACID per table, but PK/FK are informational, not enforced — don't rely on FK integrity.
  5. A date dimension pre-computes descriptors (quarter/month/fiscal) so queries join instead of recomputing.

Before you move on — say these without scrolling up

  1. Customer moves house, business wants full history — which SCD type, and how is the old row handled?
  2. "Only the current value matters" vs "attribute never changes" — which types?
  3. Why isn't Delta time travel a substitute for SCD 2?
  4. On a Delta star schema, are FK constraints enforced?

Next: the physical modelling choices underneath — managed vs external tables, clones, and materialization → [Delta data models — managed vs external, clones, and materialization](/lessons/s10-data-models/).

Prerequisites

Leads to