Lessons

Monitoring & Alerting

System tables — the account's durable, queryable memory

Use system tables (billing usage, audit access, query history) to build observability, cost-attribution, and compliance reporting with SQL.

The Spark UI forgets everything when the cluster stops; run history expires in 60 days. So how does a platform lead answer "what did each team spend last quarter" or an auditor's "who read the customer table in the last 90 days"? Those need a surface that is durable, account-wide, and queryable with plain SQL — system tables. If the last lesson was the map, this is its one durable continent.


The spine

Beat 1 — the anchor: monitoring becomes a SELECT

Anchor. System tables are Databricks-managed Delta tables, in the system catalog, that record what happens across your whole account — cost, access, query execution, job runs — queried with ordinary SQL. No agent, no export, no extra infra. The skill is knowing which table holds your answer and which columns to group by.

Layout is system.<schema>.<table>, three-part names, read-only (Databricks writes; you query):

TableHoldsClassic use
system.billing.usageone row per workload's usage, in DBUscost attribution / chargeback
system.billing.list_pricesprice per SKU over timeconvert DBUs → dollars
system.access.auditevery permission-checked action (who/what/when)compliance, security
system.query.historyevery query's text, duration, rowsfind slow/expensive queries
system.compute.* / system.lakeflow.*cluster, warehouse, job metadatacorrelate cost to compute/jobs

(Some schemas must be enabled by an account admin — treat "is it enabled" as a prerequisite.)

Beat 2 — system.billing.usage, and the #1 trap

The exam leans on this table. Its columns:

Predict: a query sums usage_quantity per team and labels the result "cost in dollars." What's wrong?

usage_quantity is in DBUs, not money. To get dollars you join list_prices on sku_name and multiply. A "cost" question whose query sums usage_quantity alone is reporting DBUs — the correct answer includes the price join. (That's the single most-tested trap here.)

The attribution pattern (both exam questions are this):

SELECT usage_date,
       identity_metadata.run_as AS run_as,   -- who
       sku_name,                             -- which compute
       SUM(usage_quantity)      AS dbus      -- how much (DBUs)
FROM system.billing.usage
WHERE usage_date >= DATE_SUB(CURRENT_DATE, 30)
GROUP BY usage_date, run_as, sku_name
ORDER BY usage_date DESC, dbus DESC;

GROUP BY run_as, sku_name per day = per-user, per-compute consumption → build a dashboard, schedule a daily refresh, share with execs. (For dollars, join list_prices.)

Lock it. billing.usage: usage_date / sku_name / usage_quantity (DBUs) / run_as. Attribution = GROUP BY run_as, sku_name. DBUs ≠ dollars → join list_prices.


The dials (skim now; return when a question needs one)

◆ The other two you'll be asked about

(Note: a full cost+usage dashboard often draws on both billing and access — usage for spend, audit for who/what.)

◆ Gotchas

Takeaways (rebuild it from these)

  1. System tables = Databricks-managed Delta tables in system, account-wide, durable, read-only, queried with SQL. Monitoring becomes a SELECT.
  2. system.billing.usage: usage_date, sku_name, usage_quantity in DBUs, identity_metadata.run_as. Attribution = GROUP BY run_as, sku_name per day → dashboard + daily refresh.
  3. DBUs are not dollars — join system.billing.list_prices on sku_name.
  4. system.access.audit = who-did-what (compliance, most-accessed table); system.query.history = durable per-query record (slowest queries).
  5. Watch retention (copy out for long compliance) and cost (queries use compute); some schemas need admin enablement.

Before you move on — say these without scrolling up

  1. Where does durable, account-wide activity live, and how do you read it?
  2. A "cost per team" query sums usage_quantity — why is that not cost, and the fix?
  3. Which columns give you per-user, per-compute attribution?
  4. "Who accessed table X for the auditor" vs "the slowest queries this week" — which system table each?

Next: once a monitoring query returns a number, how do you get told when it crosses a line — SQL Alerts → [SQL Alerts — the single-value rule that makes or breaks them](/lessons/s5-sql-alerts/).

Prerequisites

Leads to