Lessons

Foundations

How Delta Lake works — the transaction log

Foundation for Cost & Performance (§6) and Data Modelling (§10); Delta Lake is named across the exam.

In [The one job — and the two axes everything lives on](/lessons/f1-the-one-job/) the target of every pipeline was "a Delta table," and I asked you to take that on trust. This lesson pays that debt. Almost every storage, optimisation, and modelling question on the exam is really a question about the one mechanism we build here — so if you own this, a whole third of the bank becomes derivable.

First — what Delta Lake is (and is not)

Get the family right before any mechanics (this is the trap most people fall into). Delta Lake is not a database, and not a new storage system. It does not hold your data in some special place. Your data is, and stays, ordinary Parquet files sitting in cheap cloud object storage (ADLS, S3, GCS). What Delta Lake adds is a metadata layer — a transaction log — written next to those Parquet files, in a folder called _delta_log. That log is the entire difference between "a pile of files" and "a reliable table."

So the relationship, plainly: object storage holds the files; Parquet is the file format; Delta Lake is the log laid on top that turns those files into a table with guarantees. It doesn't move or rewrite your data to exist — it just starts keeping a ledger about it. (This is why the lakehouse from [The one job — and the two axes everything lives on](/lessons/f1-the-one-job/) is possible at all: warehouse-grade reliability on top of lake-grade cheap files.)

Why a plain pile of files isn't a table — the pain

Object storage on its own is "a filesystem with a billing model." It has no notion of a transaction, no schema enforcement, no way to update one row, no cleanup when a write dies halfway. Three concrete failures follow, and each maps to one letter of ACID — worth feeling before we fix them:

Delta fixes all three with one idea: the log.

The anchor: current state = every file added, minus every file removed

This single sentence generates everything else, so plant it and keep returning to it:

Anchor. A Delta table's current contents = the set of data files that have been added and not later removed, according to the log. Delta never edits a Parquet file's bytes; it only records "this file is now part of the table" or "this file no longer is."

Everything below — reading, updating, deleting, time travel, ACID — is just a consequence of keeping an honest add/remove ledger.

The transaction log, concretely

A Delta table on disk is two things: the Parquet data files, and the _delta_log folder beside them.

my_table/
├── _delta_log/
│     ├── 00000000000000000000.json   ← version 0 (e.g. CREATE)
│     ├── 00000000000000000001.json   ← version 1 (INSERT)
│     ├── 00000000000000000002.json   ← version 2 (UPDATE)
│     └── ...
├── part-00000-aaa.parquet            ← data files (Parquet, immutable)
├── part-00001-bbb.parquet
└── part-00002-ccc.parquet

Each NN...N.json is one commit — the atomic record of one transaction. A commit holds up to four kinds of entry, but only the last matters for the anchor:

Reading the table is then trivial to state: Delta does not replay history row-by-row. It walks the log, takes every add, subtracts every remove, and reads exactly the Parquet files still standing. A worked example — five commits on an invoices table:

VersionOperationActive files after
v0CREATE + load invoices 101–200file1
v1INSERT invoices 1–100file1, file2
v2UPDATE one rowfile1, file2 (+deletion vector), file3
v3DELETE one rowfile1, file2 (+updated deletion vector), file3
v4INSERT invoices 201–99457file1, file2, file3, file4

SELECT * at v4 reads the four active files (applying deletion vectors — defined below); any file that was ever removed is simply ignored.

Scaling the log — and the checkpoint collision resolved. Thousands of tiny JSON commits would be slow to read, so every ~10 commits Delta writes a Delta-log checkpoint: a single Parquet snapshot of the net current state (adds minus removes so far). A read then loads the latest checkpoint plus only the handful of JSON commits after it — cost is O(commits since last checkpoint), never O(all history).

Now, the word you were promised in [Structured Streaming & the state model](/lessons/s1-structured-streaming-state/): this Delta-log checkpoint is a different machine from the streaming checkpoint. Same word, two meanings — ask "which is in play?":

"checkpoint"What it isWhere it lives
Streaming checkpointa streaming query's progress bookmark + operation state, for crash recoverythe query's checkpointLocation
Delta-log checkpointa periodic Parquet snapshot of a table's transaction log, so reads stay fastinside the table's _delta_log/

One belongs to a running query; the other belongs to a table at rest. They never touch.

ACID, each solving one of the three pains

What actually happens to files on each DML — the rule that makes you dangerous

The rule: Parquet files are IMMUTABLE. Delta never changes bytes inside an existing file. Every change writes new files and records add/remove. (This is forced by physics — you cannot seek into a file on S3/ADLS and overwrite four bytes; object storage is write-once. Delta works with that constraint, not against it.)

Batch writes: CTAS, overwrite, append, and DROP

The DML above is row-level; you also write tables in bulk, and a cluster of exam questions lives here. All of it is still just adds/removes on the log — but the behaviours are worth stating.

Time travel — a free consequence

Because the log preserves every add/remove, the state at any past version is fully reconstructable. SELECT * FROM t VERSION AS OF 3 or TIMESTAMP AS OF '2026-01-15' reads the files that were active then. RESTORE creates a new version matching an old state (history is preserved, not erased). This works right up until VACUUM physically deletes the old files — the one action that truly breaks time travel (again, [Right-sizing files — OPTIMIZE, optimized writes, auto compaction, VACUUM](/lessons/s6-compaction/)).

Takeaways (rebuild it from these)

  1. Delta Lake = a transaction log (_delta_log) over immutable Parquet files in object storage. Not a database, not new storage — a ledger on top.
  2. Current state = adds − removes. Every capability derives from that ledger.
  3. A commit JSON carries commitInfo / metaData / protocol / add-remove; Delta-log checkpoints snapshot the net state so reads stay O(since last checkpoint). This checkpoint is not the streaming checkpoint.
  4. ACID, each solving a real failure: atomic JSON commit; schema-on-write; OCC file-level isolation (no locks, conflict at commit); durable object storage.
  5. Parquet is immutable — every change writes new files. UPDATE/DELETE use deletion vectors (mark rows dead, don't rewrite); MERGE is the keyed upsert (maps by name). Time travel falls out of the preserved log.

From this foundation, Section 6 becomes "what you do to these files to keep them fast" — compaction and cleanup ([Right-sizing files — OPTIMIZE, optimized writes, auto compaction, VACUUM](/lessons/s6-compaction/)), layout ([Organizing files — partitioning, Z-order, liquid clustering (and deletion vectors)](/lessons/s6-data-layout/), [The performance model — why a query is slow, and the one lever](/lessons/s6-performance-model/)), and change streams ([Change Data Feed — emitting a table's changes downstream](/lessons/s6-cdf/)) — and Section 10 becomes "how you shape them into models" ([Delta data models — managed vs external, clones, and materialization](/lessons/s10-data-models/)).

Prerequisites

Leads to