Section 6 gave you the levers — skip files, right-size, organize, automate. This lesson tells you which lever a slow query actually needs, so you fix the real bottleneck instead of guessing. It's the capstone: every symptom points back to a fix from an earlier lesson.
The spine
Beat 1 — the anchor: don't guess, read the profile
Predict: a query is slow. Before changing anything, what should you do first?
…
Read the evidence. Open the Query Profile (or Spark UI) — it tells you which of four problems it is, and each has one specific fix you already know:
Anchor. A slow query is one of four things: bad data skipping (read too many files), a wrong join strategy (shuffled when it could broadcast), data spill (partitions too big for memory), or data skew (one task dwarfs the rest). Read the profile, name the symptom, pull the matching lever. Measure, don't guess.
Beat 2 — the four diagnoses → the four fixes
This is the whole objective — each row a callback:
- Bad data skipping — the scan read far more files/bytes than the filter needs (a
WHERE event_date=…scanned the whole table). Cause ([The performance model — why a query is slow, and the one lever](/lessons/s6-performance-model/)): the filter column isn't in the layout, so file ranges all overlap. Fix: liquid-cluster / Z-order on that column ([Organizing files — partitioning, Z-order, liquid clustering (and deletion vectors)](/lessons/s6-data-layout/)) + ensure stats (first-32 rule). - Inefficient join — a big shuffle feeding a sort-merge join where one side is actually small. A broadcast join ships the small table to every executor so the big one is never shuffled. Fix: let AQE flip it (
[Job & environment configuration — compute and Spark tuning](/lessons/s1-job-env-config/)), or a/*+ BROADCAST(dim) */hint /broadcast(); raisebroadcastTimeoutfor a largish dim. - Data spill — tasks spill to disk (profile shows spill size / Size on Disk > 0) because shuffle partitions are too big for memory. Fix: raise
spark.sql.shuffle.partitionsso each is smaller. - Data skew — one task takes far longer / reads far more than its siblings (uneven task-time distribution in Spark UI). Fix: AQE skew handling (splits the hot partition at runtime) + liquid clustering (balances on disk) + salting if needed.
Lock it. Four symptoms, four fixes: too many files → cluster on filter column; big shuffle + small side → broadcast; spill → more shuffle partitions; straggler task → AQE/clustering/salting.
The dials (skim now; return when a question needs one)
◆ The two tools
- Query Profile (Databricks SQL) — the execution plan with per-operator metrics: rows and bytes/files read per scan, join type, shuffle volume, spill. The Top Operators panel surfaces highest-cost operators.
- Spark UI — deeper job/stage/task view: per-stage DAG, per-task shuffle read/write, spill bytes, and the distribution of task times (how you spot skew). Query Profile = "what cost most"; Spark UI = "why, at task level."
◆ UI landmarks (asked directly)
- Query Profile panels: Details, Top operators, Query text. "Query source" is NOT a panel (the classic odd-one-out). And the Query Profile can't edit/re-run parts of the plan — it's read-only evidence.
- Total wall-clock time = read straight from the Query Profile (end-to-end duration). Don't sum task times (parallelism overcounts); a single job's duration ignores other jobs.
- The visualized query plan / DAG lives in Spark UI → SQL/DataFrame tab (logical + physical plans, per-operator metrics).
- A Spark UI stage details page shows Duration, GC time, Shuffle spill (memory/disk). DBU cost is NOT a Spark UI metric — that's billing (
system.billing.usage,[System tables — the account's durable, queryable memory](/lessons/s5-system-tables/)). "Which is not a stage metric" → the cost/billing one.
◆ Two more join/tuning levers
- Range-join hint — for a join on a range/interval condition (
BETWEEN, geospatial bins), a plain join is quadratic; add/*+ RANGE_JOIN(r, 65536) */with a bin size to make it efficient. Tell: "join on a range/interval is slow" → RANGE_JOIN hint. spark.sql.files.maxPartitionBytes— lower it (or pick a higher core-to-memory ratio instance) when tasks are memory-pressured, to read smaller partitions with more parallelism.
◆ Two more signals
- Cluster CPU/memory metrics — tell if you're CPU-bound (~75% → bigger/more nodes) or under-utilised (~25% → over-provisioned, cut cost). The cost side of "performance."
- System tables —
system.billing.usagefor cost/usage over time ([System tables — the account's durable, queryable memory](/lessons/s5-system-tables/)). Query Profile explains one query; system tables explain the bill.
Takeaways (rebuild it from these)
- Don't guess — read the Query Profile / Spark UI. Query Profile = per-operator cost + Top Operators; Spark UI = per-stage/task detail (shuffle, spill, task-time distribution for skew).
- Bad data skipping → cluster/Z-order on the filter column + stats (
[Organizing files — partitioning, Z-order, liquid clustering (and deletion vectors)](/lessons/s6-data-layout/)). - Inefficient join (big shuffle, small side) → broadcast (AQE/hint); range/interval join → RANGE_JOIN hint.
- Spill (Size on Disk > 0) → raise
shuffle.partitions; memory-pressured → lowermaxPartitionBytes/ higher core-to-memory instance. - Skew (one straggler) → AQE + liquid clustering + salting. UI trivia: no "Query source" panel; wall-clock from Query Profile; DAG in Spark UI SQL tab; DBU cost isn't a Spark UI metric.
Before you move on — say these without scrolling up
- The four things a slow query can be — and the one fix for each.
- Big shuffle with one small side vs a slow range join — which hint each?
- Where do you read a query's true total wall-clock time — and why not sum task times?
- Which is not a Spark UI stage metric: Duration, GC time, Shuffle spill, DBU cost?
That completes Section 6: reads = bytes → right-size → organize → automate → move changes (CDF) → measure & fix. Every lever, and how to know which to pull.