Back in [Data quality — expectations and constraints, and what happens to a bad row](/lessons/s3-data-quality/) you declared expectations — EXPECT (amount > 0) — and I said every violation is "counted in the event log." This is that lesson. When a Lakeflow pipeline runs, it keeps a running record of itself: what updated, how many rows flowed, and — crucially — how many rows passed and failed each expectation. That record is the pipeline event log, and querying it is how you turn declared rules into a dashboard or alert.
The spine
Beat 1 — the anchor: two coordinates
The expectations lesson was the write side (define the rule, choose warn/drop/fail). This is the read side — get the numbers back out.
Predict: a pipeline runs, expectations pass and fail. Where in the event log do those pass/fail counts actually live — and how are they stored?
…
Anchor. Every Lakeflow pipeline run writes a structured event log — progress, flow metrics, data-quality results, lineage, errors — read with SQL. The metrics you want live in rows where
event_type = 'flow_progress', inside the JSONdetailscolumn. Master those two coordinates (flow_progress+ parsedetails) and every event-log question falls out.
Beat 2 — read it with event_log(), then drill the JSON
You don't need to know where the log is physically stored — use the event_log('<pipeline-id>') table-valued function (on a SQL warehouse or shared cluster):
SELECT timestamp, event_type, details
FROM event_log('<pipeline-id>')
WHERE event_type = 'flow_progress'
ORDER BY timestamp DESC;
details is JSON — drill in with Databricks SQL's : path syntax. The data-quality block lists each expectation with pass/fail counts:
SELECT
timestamp,
details:flow_progress.status AS status,
explode(from_json(
details:flow_progress.data_quality.expectations,
'array<struct<name:string,dataset:string,passed_records:bigint,failed_records:bigint>>'
)) AS e
FROM event_log('<pipeline-id>')
WHERE event_type = 'flow_progress'
AND details:flow_progress.data_quality IS NOT NULL;
The path to hold: details:flow_progress.data_quality.expectations — an array of {name, dataset, passed_records, failed_records}. That's the whole answer to "extract data-quality results programmatically." flow_progress also carries throughput metrics (num_output_rows, num_upserted_rows, num_deleted_rows).
Lock it.
event_log('<id>')→ filterevent_type = 'flow_progress'→ parsedetails:flow_progress.data_quality.expectationsforpassed_records/failed_records.
The dials (skim now; return when a question needs one)
◆ The renamed-shape trap (freshness)
Older material (and the exam's vintage) describes the log as action-based: rows with action values like START_UPDATE, EXPECTATION_PASSED, EXPECTATION_FAILED — one row per outcome. The current model keys rows by event_type, with data quality aggregated under flow_progress → data_quality.expectations (pass/fail counts, not one row per failed record). Recognise the old EXPECTATION_FAILED phrasing, but reach for event_type = 'flow_progress' as the correct current answer.
◆ Where it plugs in
The pipeline-scoped surface from [The monitoring map — which surface answers which question](/lessons/s5-observability-surfaces/) — deliberately not the cluster or audit log (the three-way collision). It closes a loop:
- Upstream:
[Data quality — expectations and constraints, and what happens to a bad row](/lessons/s3-data-quality/)defines the expectations whose results land here (define with@dp.expect*, read withflow_progress). - Downstream: feed a SQL Alert off a query like the above — fire when
failed_recordsfor a critical expectation exceeds a threshold — so bad quality pages someone ([SQL Alerts — the single-value rule that makes or breaks them](/lessons/s5-sql-alerts/)).
Takeaways (rebuild it from these)
- The pipeline event log is a Lakeflow pipeline's structured self-record (progress, flow metrics, data-quality, lineage, errors), queried with SQL.
- Read with
event_log('<pipeline-id>')— no storage path needed. - Data-quality metrics live in
event_type = 'flow_progress', JSON pathdetails:flow_progress.data_quality.expectations→name,dataset,passed_records,failed_records. - Freshness: current model is
event_type/flow_progress(aggregated counts), not the older action-basedEXPECTATION_FAILED— recognise both. - Pipeline-scoped (not cluster/audit); pairs with
[Data quality — expectations and constraints, and what happens to a bad row](/lessons/s3-data-quality/)(define) upstream and[SQL Alerts — the single-value rule that makes or breaks them](/lessons/s5-sql-alerts/)(notify) downstream.
Before you move on — say these without scrolling up
- The two coordinates that locate data-quality metrics in the event log.
- What function reads the log, and what's the JSON path to the expectations array?
- The four fields in each expectation entry.
- Old
EXPECTATION_FAILEDrows vs the current shape — which do you reach for?
Next: the "notify" half — SQL Alerts, and the single-value rule that decides whether your alert even works → [SQL Alerts — the single-value rule that makes or breaks them](/lessons/s5-sql-alerts/).