Lessons

Monitoring & Alerting

The pipeline event log — where a Lakeflow pipeline records itself

Query the Lakeflow pipeline event log to extract execution progress and data-quality expectation metrics programmatically.

Back in [Data quality — expectations and constraints, and what happens to a bad row](/lessons/s3-data-quality/) you declared expectations — EXPECT (amount > 0) — and I said every violation is "counted in the event log." This is that lesson. When a Lakeflow pipeline runs, it keeps a running record of itself: what updated, how many rows flowed, and — crucially — how many rows passed and failed each expectation. That record is the pipeline event log, and querying it is how you turn declared rules into a dashboard or alert.


The spine

Beat 1 — the anchor: two coordinates

The expectations lesson was the write side (define the rule, choose warn/drop/fail). This is the read side — get the numbers back out.

Predict: a pipeline runs, expectations pass and fail. Where in the event log do those pass/fail counts actually live — and how are they stored?

Anchor. Every Lakeflow pipeline run writes a structured event log — progress, flow metrics, data-quality results, lineage, errors — read with SQL. The metrics you want live in rows where event_type = 'flow_progress', inside the JSON details column. Master those two coordinates (flow_progress + parse details) and every event-log question falls out.

Beat 2 — read it with event_log(), then drill the JSON

You don't need to know where the log is physically stored — use the event_log('<pipeline-id>') table-valued function (on a SQL warehouse or shared cluster):

SELECT timestamp, event_type, details
FROM event_log('<pipeline-id>')
WHERE event_type = 'flow_progress'
ORDER BY timestamp DESC;

details is JSON — drill in with Databricks SQL's : path syntax. The data-quality block lists each expectation with pass/fail counts:

SELECT
  timestamp,
  details:flow_progress.status AS status,
  explode(from_json(
    details:flow_progress.data_quality.expectations,
    'array<struct<name:string,dataset:string,passed_records:bigint,failed_records:bigint>>'
  )) AS e
FROM event_log('<pipeline-id>')
WHERE event_type = 'flow_progress'
  AND details:flow_progress.data_quality IS NOT NULL;

The path to hold: details:flow_progress.data_quality.expectations — an array of {name, dataset, passed_records, failed_records}. That's the whole answer to "extract data-quality results programmatically." flow_progress also carries throughput metrics (num_output_rows, num_upserted_rows, num_deleted_rows).

Lock it. event_log('<id>') → filter event_type = 'flow_progress' → parse details:flow_progress.data_quality.expectations for passed_records / failed_records.


The dials (skim now; return when a question needs one)

◆ The renamed-shape trap (freshness)

Older material (and the exam's vintage) describes the log as action-based: rows with action values like START_UPDATE, EXPECTATION_PASSED, EXPECTATION_FAILED — one row per outcome. The current model keys rows by event_type, with data quality aggregated under flow_progress → data_quality.expectations (pass/fail counts, not one row per failed record). Recognise the old EXPECTATION_FAILED phrasing, but reach for event_type = 'flow_progress' as the correct current answer.

◆ Where it plugs in

The pipeline-scoped surface from [The monitoring map — which surface answers which question](/lessons/s5-observability-surfaces/) — deliberately not the cluster or audit log (the three-way collision). It closes a loop:

Takeaways (rebuild it from these)

  1. The pipeline event log is a Lakeflow pipeline's structured self-record (progress, flow metrics, data-quality, lineage, errors), queried with SQL.
  2. Read with event_log('<pipeline-id>') — no storage path needed.
  3. Data-quality metrics live in event_type = 'flow_progress', JSON path details:flow_progress.data_quality.expectationsname, dataset, passed_records, failed_records.
  4. Freshness: current model is event_type/flow_progress (aggregated counts), not the older action-based EXPECTATION_FAILED — recognise both.
  5. Pipeline-scoped (not cluster/audit); pairs with [Data quality — expectations and constraints, and what happens to a bad row](/lessons/s3-data-quality/) (define) upstream and [SQL Alerts — the single-value rule that makes or breaks them](/lessons/s5-sql-alerts/) (notify) downstream.

Before you move on — say these without scrolling up

  1. The two coordinates that locate data-quality metrics in the event log.
  2. What function reads the log, and what's the JSON path to the expectations array?
  3. The four fields in each expectation entry.
  4. Old EXPECTATION_FAILED rows vs the current shape — which do you reach for?

Next: the "notify" half — SQL Alerts, and the single-value rule that decides whether your alert even works → [SQL Alerts — the single-value rule that makes or breaks them](/lessons/s5-sql-alerts/).

Prerequisites

Leads to