Design Databricks SQL Alerts that evaluate a query result against a threshold, including collapsing multi-condition logic into a single evaluated value.

You've built the monitoring queries — cost per user from system tables, failed-records from the pipeline event log, null-rate on a table. But nobody watches a dashboard at 3 a.m. A SQL Alert runs a query on a schedule and notifies you when the result crosses a line. Simple idea — with exactly one sharp edge that every exam question here pokes at.

The spine

Beat 1 — the anchor: the alert only compares; the query must think

Predict: you want to be paged when "missing-email % > 15% AND invalid-phone % > 10%." Can you just hand the alert those two columns and an AND?

…

No — and that "no" is the whole lesson:

Anchor. A SQL Alert evaluates a single value — one field from the query result — against a threshold, and notifies when true. It cannot natively "trigger when A > x AND B > y." All the intelligence lives in the SQL query: write a query whose one returned field is true exactly when you want to be paged. The alert only compares; the query must think.

Internalize the inversion and every SQL-alert question becomes: how do I make the query emit a single field meaning "alert now"?

Beat 2 — the consequence: collapse conditions into one field

The most-tested move. Two conditions → one field, two clean ways:

1. A CASE returning 1/0 — evaluate the boolean in SQL, alert on = 1:

SELECT CASE
         WHEN new_tickets > 200 AND under_processing > 150 THEN 1 ELSE 0
       END AS should_alert
FROM ( SELECT SUM(CASE WHEN status='new' THEN 1 ELSE 0 END) AS new_tickets,
              SUM(CASE WHEN status='in_progress' THEN 1 ELSE 0 END) AS under_processing
       FROM tickets );
-- Alert condition: should_alert = 1

Both thresholds live inside the CASE; the alert checks one scalar. (The call-center/ticket question and its siblings.)

2. Compute the metrics as aggregate columns and point the alert's evaluated field at the one that matters. Same principle — query does the arithmetic, alert reads one number.

The wrong answers always return multiple columns/rows and expect the alert to AND them — it can't.

Lock it. Multiple conditions → CASE WHEN c1 AND c2 THEN 1 ELSE 0 END, alert on = 1 (or a single aggregate). One field carries the verdict.

The dials (skim now; return when a question needs one)

◆ Grouped queries — query decides what, alert config decides how often

When the query groups (e.g. mean(temperature) > 120 per sensor_id), the alert evaluates its configured column against the threshold across the returned rows — firing per group as framed. If it sends three consecutive notifications, that's the alert re-evaluating on its schedule and re-notifying while the condition holds — cadence is an alert setting, not query logic. Keep straight: the query decides what is compared; the alert schedule/notification config decides how often you hear it. (So "average temp for at least one sensor exceeded 120 on three consecutive executions" = the query's per-sensor condition held across three scheduled runs.)

◆ Temporal conditions — push the time logic into the query

"Alert when the daily average of percent_null exceeds 5% for three consecutive days, notify once":

Daily average + three-consecutive-days = data logic → in the query (GROUP BY DATE_TRUNC('DAY', window.end) for the daily average, then window/self-join to check the run of days), emitting one true/false field.
Notify once = an alert configuration choice.

That percent_null typically comes from a Lakehouse Monitoring profile-metrics table — Databricks auto-profiles a table and writes metrics (percent_null, drift) to a metrics table on a schedule, which you then alert on. The standard "monitor a column's null rate over time" pipeline.

◆ The current model — Alerts v2 (freshness)

Alerts v2 define an evaluation: a source (which column to evaluate), a comparison_operator (GREATER_THAN, EQUAL, …), a threshold (a literal or another column). States: OK / TRIGGERED / ERROR — legacy UNKNOWN is gone.
Legacy alerts framed it as "trigger when a value passes a threshold / when any row is returned." Single-value principle identical; only the config surface and state names changed.

Channels (email, Slack, PagerDuty, webhooks), schedule, destinations are configuration. And tune thresholds / "notify once" to avoid alert fatigue — an ignored alert is worse than none.

Takeaways (rebuild it from these)

A SQL Alert compares one field to a threshold — the alert only compares; the query must do the thinking.
Multiple conditions → collapse into one field: CASE WHEN c1 AND c2 THEN 1 ELSE 0 END, alert on = 1. Returning many columns for the alert to AND is the classic wrong answer.
Grouped query → alert evaluates the chosen column per result; notification cadence ("three consecutive", "notify once") is an alert setting, not query logic.
Temporal conditions → time logic in the query (GROUP BY DATE_TRUNC, window/self-join); often fed by Lakehouse Monitoring profile-metrics (percent_null, drift).
Alerts v2 = source + comparison_operator + threshold; states OK/TRIGGERED/ERROR (no UNKNOWN).

Before you move on — say these without scrolling up

Why can't an alert do "A > x AND B > y" directly — and what do you do instead?
A grouped alert fires three times — is that the query or the alert config?
"Daily avg over 3 consecutive days, notify once" — which part is query, which is alert setting?
Alerts v2 evaluation — the three pieces, and the states.

Next: the operational side — driving jobs via REST/CLI, notifications, and retry settings → [Operational job monitoring — REST/CLI, notifications, retry policy](/lessons/s5-job-monitoring/).

SQL Alerts — the single-value rule that makes or breaks them