Query the Lakeflow pipeline event log to extract execution progress and data-quality expectation metrics programmatically.

Back in [Data quality — expectations and constraints, and what happens to a bad row](/lessons/s3-data-quality/) you declared expectations — EXPECT (amount > 0) — and I said every violation is "counted in the event log." This is that lesson. When a Lakeflow pipeline runs, it keeps a running record of itself: what updated, how many rows flowed, and — crucially — how many rows passed and failed each expectation. That record is the pipeline event log, and querying it is how you turn declared rules into a dashboard or alert.

The spine

Beat 1 — the anchor: two coordinates

The expectations lesson was the write side (define the rule, choose warn/drop/fail). This is the read side — get the numbers back out.

Predict: a pipeline runs, expectations pass and fail. Where in the event log do those pass/fail counts actually live — and how are they stored?

…

Anchor. Every Lakeflow pipeline run writes a structured event log — progress, flow metrics, data-quality results, lineage, errors — read with SQL. The metrics you want live in rows where event_type = 'flow_progress', inside the JSON details column. Master those two coordinates (flow_progress + parse details) and every event-log question falls out.

Beat 2 — read it with `event_log()`, then drill the JSON

You don't need to know where the log is physically stored — use the event_log('<pipeline-id>') table-valued function (on a SQL warehouse or shared cluster):

SELECT timestamp, event_type, details
FROM event_log('<pipeline-id>')
WHERE event_type = 'flow_progress'
ORDER BY timestamp DESC;

details is JSON — drill in with Databricks SQL's : path syntax. The data-quality block lists each expectation with pass/fail counts:

SELECT
  timestamp,
  details:flow_progress.status AS status,
  explode(from_json(
    details:flow_progress.data_quality.expectations,
    'array<struct<name:string,dataset:string,passed_records:bigint,failed_records:bigint>>'
  )) AS e
FROM event_log('<pipeline-id>')
WHERE event_type = 'flow_progress'
  AND details:flow_progress.data_quality IS NOT NULL;

The path to hold: details:flow_progress.data_quality.expectations — an array of {name, dataset, passed_records, failed_records}. That's the whole answer to "extract data-quality results programmatically." flow_progress also carries throughput metrics (num_output_rows, num_upserted_rows, num_deleted_rows).

Lock it. event_log('<id>') → filter event_type = 'flow_progress' → parse details:flow_progress.data_quality.expectations for passed_records / failed_records.

The dials (skim now; return when a question needs one)

◆ The renamed-shape trap (freshness)

Older material (and the exam's vintage) describes the log as action-based: rows with action values like START_UPDATE, EXPECTATION_PASSED, EXPECTATION_FAILED — one row per outcome. The current model keys rows by event_type, with data quality aggregated under flow_progress → data_quality.expectations (pass/fail counts, not one row per failed record). Recognise the old EXPECTATION_FAILED phrasing, but reach for event_type = 'flow_progress' as the correct current answer.

◆ Where it plugs in

The pipeline-scoped surface from [The monitoring map — which surface answers which question](/lessons/s5-observability-surfaces/) — deliberately not the cluster or audit log (the three-way collision). It closes a loop:

Upstream: [Data quality — expectations and constraints, and what happens to a bad row](/lessons/s3-data-quality/) defines the expectations whose results land here (define with @dp.expect*, read with flow_progress).
Downstream: feed a SQL Alert off a query like the above — fire when failed_records for a critical expectation exceeds a threshold — so bad quality pages someone ([SQL Alerts — the single-value rule that makes or breaks them](/lessons/s5-sql-alerts/)).

Takeaways (rebuild it from these)

The pipeline event log is a Lakeflow pipeline's structured self-record (progress, flow metrics, data-quality, lineage, errors), queried with SQL.
Read with event_log('<pipeline-id>') — no storage path needed.
Data-quality metrics live in event_type = 'flow_progress', JSON path details:flow_progress.data_quality.expectations → name, dataset, passed_records, failed_records.
Freshness: current model is event_type/flow_progress (aggregated counts), not the older action-based EXPECTATION_FAILED — recognise both.
Pipeline-scoped (not cluster/audit); pairs with [Data quality — expectations and constraints, and what happens to a bad row](/lessons/s3-data-quality/) (define) upstream and [SQL Alerts — the single-value rule that makes or breaks them](/lessons/s5-sql-alerts/) (notify) downstream.

Before you move on — say these without scrolling up

The two coordinates that locate data-quality metrics in the event log.
What function reads the log, and what's the JSON path to the expectations array?
The four fields in each expectation entry.
Old EXPECTATION_FAILED rows vs the current shape — which do you reach for?

Next: the "notify" half — SQL Alerts, and the single-value rule that decides whether your alert even works → [SQL Alerts — the single-value rule that makes or breaks them](/lessons/s5-sql-alerts/).

The pipeline event log — where a Lakeflow pipeline records itself