The Spark UI forgets everything when the cluster stops; run history expires in 60 days. So how does a platform lead answer "what did each team spend last quarter" or an auditor's "who read the customer table in the last 90 days"? Those need a surface that is durable, account-wide, and queryable with plain SQL — system tables. If the last lesson was the map, this is its one durable continent.
The spine
Beat 1 — the anchor: monitoring becomes a SELECT
Anchor. System tables are Databricks-managed Delta tables, in the
systemcatalog, that record what happens across your whole account — cost, access, query execution, job runs — queried with ordinary SQL. No agent, no export, no extra infra. The skill is knowing which table holds your answer and which columns to group by.
Layout is system.<schema>.<table>, three-part names, read-only (Databricks writes; you query):
| Table | Holds | Classic use |
|---|---|---|
system.billing.usage | one row per workload's usage, in DBUs | cost attribution / chargeback |
system.billing.list_prices | price per SKU over time | convert DBUs → dollars |
system.access.audit | every permission-checked action (who/what/when) | compliance, security |
system.query.history | every query's text, duration, rows | find slow/expensive queries |
system.compute.* / system.lakeflow.* | cluster, warehouse, job metadata | correlate cost to compute/jobs |
(Some schemas must be enabled by an account admin — treat "is it enabled" as a prerequisite.)
Beat 2 — system.billing.usage, and the #1 trap
The exam leans on this table. Its columns:
usage_date— the day (group by for daily reports).sku_name— what kind of compute (all-purpose / jobs / SQL-warehouse / serverless — the menu from[Job & environment configuration — compute and Spark tuning](/lessons/s1-job-env-config/)).usage_quantity— how much, in DBUs (a Databricks Unit — the normalized billing unit).identity_metadata.run_as— who ran it (user or service principal).
Predict: a query sums
usage_quantityper team and labels the result "cost in dollars." What's wrong?
…
usage_quantity is in DBUs, not money. To get dollars you join list_prices on sku_name and multiply. A "cost" question whose query sums usage_quantity alone is reporting DBUs — the correct answer includes the price join. (That's the single most-tested trap here.)
The attribution pattern (both exam questions are this):
SELECT usage_date,
identity_metadata.run_as AS run_as, -- who
sku_name, -- which compute
SUM(usage_quantity) AS dbus -- how much (DBUs)
FROM system.billing.usage
WHERE usage_date >= DATE_SUB(CURRENT_DATE, 30)
GROUP BY usage_date, run_as, sku_name
ORDER BY usage_date DESC, dbus DESC;
GROUP BY run_as, sku_name per day = per-user, per-compute consumption → build a dashboard, schedule a daily refresh, share with execs. (For dollars, join list_prices.)
Lock it.
billing.usage:usage_date/sku_name/usage_quantity(DBUs) /run_as. Attribution =GROUP BY run_as, sku_name. DBUs ≠ dollars → joinlist_prices.
The dials (skim now; return when a question needs one)
◆ The other two you'll be asked about
system.access.audit— the authoritative who-did-what: timestamp,user_identity,action(SELECT/INSERT/…), object. The answer to "auditor wants who accessed/modified table X" and "which table is queried most." It's the account-wide audit log from the collision (distinct from cluster/pipeline event logs), tied to governance in[Access control — least privilege and the object permission ladders](/lessons/s7-access-control/).system.query.history— every query's text, duration, rows scanned. The durable, cross-warehouse cousin of the Query Profile: Query Profile explains one query interactively ([Reading the evidence — Query Profile & Spark UI](/lessons/s6-query-profile/));query.historyfinds the slowest queries of the week with aSELECT.
(Note: a full cost+usage dashboard often draws on both billing and access — usage for spend, audit for who/what.)
◆ Gotchas
- DBUs ≠ dollars — join
list_prices(the #1 trap). - System tables have their own retention — copy out for very long compliance windows.
- Querying them uses compute — schedule/materialize heavy monitoring queries rather than rescanning constantly.
Takeaways (rebuild it from these)
- System tables = Databricks-managed Delta tables in
system, account-wide, durable, read-only, queried with SQL. Monitoring becomes aSELECT. system.billing.usage:usage_date,sku_name,usage_quantityin DBUs,identity_metadata.run_as. Attribution =GROUP BY run_as, sku_nameper day → dashboard + daily refresh.- DBUs are not dollars — join
system.billing.list_pricesonsku_name. system.access.audit= who-did-what (compliance, most-accessed table);system.query.history= durable per-query record (slowest queries).- Watch retention (copy out for long compliance) and cost (queries use compute); some schemas need admin enablement.
Before you move on — say these without scrolling up
- Where does durable, account-wide activity live, and how do you read it?
- A "cost per team" query sums
usage_quantity— why is that not cost, and the fix? - Which columns give you per-user, per-compute attribution?
- "Who accessed table X for the auditor" vs "the slowest queries this week" — which system table each?
Next: once a monitoring query returns a number, how do you get told when it crosses a line — SQL Alerts → [SQL Alerts — the single-value rule that makes or breaks them](/lessons/s5-sql-alerts/).