We've built the transformations — Structured Streaming, LSDP, AUTO CDC. This lesson is the thing that runs them: on a schedule, in the right order, recovering when a piece fails. It's a big slice of the exam, but it collapses to one clean distinction.
The spine
Beat 1 — Pipeline vs Job (the anchor)
Two words get blurred; separate them first. Recall from [Lakeflow Spark Declarative Pipelines](/lessons/s1-lakeflow-sdp/) that a pipeline is the transformation logic — the @dlt.table declarations (or notebook code) that say what happens to data. A Lakeflow Job (formerly "Workflow") is the operational wrapper — it says when and how that logic runs.
Anchor. Pipeline = what happens to data. Job = when and how it runs (schedule, cluster, retries, task order, alerts). Every orchestration feature below is just the Job answering "when and how."
A subtlety to bank: a Job has no "declarative vs imperative" flavour — that lives inside the task (a task can run an LSDP pipeline or a notebook). The Job just wraps and sequences tasks.
Beat 2 — a Job is a graph of tasks
A real Job is rarely one step. It's a multi-task Job — tasks wired into a dependency graph (the same graph idea as LSDP in [Lakeflow Spark Declarative Pipelines](/lessons/s1-lakeflow-sdp/), but now at the task level, and you wire it explicitly):
depends_on— task B starts only after task A succeeds. This enforces order: ingest → silver → gold.- Parallel tasks — tasks with no mutual dependency run simultaneously. If B and C both depend only on A, they run in parallel once A finishes.
Ground it in the restaurant pipeline from [The one job — and the two axes everything lives on](/lessons/f1-the-one-job/): an ingest task, then silver (depends_on: ingest), then gold (depends_on: silver) — a chain the Job runs top to bottom, restarting nothing that already succeeded.
Lock it. Job = a graph of tasks;
depends_onorders them; independent tasks run in parallel.
Beat 3 — the surprise: what rolls back when a task fails?
Here's the buried doubt, and the official sample question tests it directly. A Job has tasks A → (B, C in parallel). A and B succeed; C fails.
Predict: what's the state of the data now? Does the Job roll back because one task failed?
…
A and B's work is fully committed, and some of C's operations may have already completed. There is no automatic cross-task rollback. A Job is a dependency graph for orchestration, not a single database transaction — each task commits its own work as it goes. (Recall [How Delta Lake works — the transaction log](/lessons/f2-delta-transaction-log/): atomicity is per Delta commit, not per Job. A task's individual writes are atomic; the Job as a whole is not.) So "because C failed, everything rolls back" is always wrong.
Lock it. No cross-task rollback. Succeeded tasks stay committed; a failed task may have partially completed.
The dials (skim now; return when a question needs one)
◆ Control flow — passing data and branching
The exam objective names control flow (if/else, for-each). These are Job-level operators:
- Task Values — one task hands a small value to another:
dbutils.jobs.taskValues.set(key, value)upstream,.get(taskKey, key)downstream. Gotcha: task values come back as strings — cast to int/float yourself. - Condition task (if/else) — routes on a boolean. Classic: a
validatetask sets aquality_scoretask value; a condition task reads it —> 0.95→process_good, else →quarantine_bad. The branch not taken is skipped, not failed. - For-Each task — one task template run over a list, instances in parallel. Trigger with
["US","EU","APAC"]→ three parallel instances, each reading its slice viadbutils.widgets.get("region"). Scales to 50 tenants without 50 tasks. (Set a concurrency limit.) outcome: "failed"dependency — a downstream task that runs only when its upstream failed (alerts, cleanup).
◆ Which cluster does a Job run on?
Match the cluster to the work:
| Cluster type | Use for | Why |
|---|---|---|
| New job cluster | production batch jobs | fresh, isolated, terminates after the run → pay only during execution |
| All-purpose cluster | interactive development only | shared, always-on → wasteful and un-isolated for production |
| Serverless | lightweight/bursty tasks, SQL warehouse tasks | auto-scales, no provisioning overhead |
| Instance pool | frequent short jobs needing fast startup | pre-warmed VMs → sub-minute startup |
Tell: "production job, lowest cost" → new job cluster, never all-purpose. And recall the streaming recovery config from [Structured Streaming & the state model](/lessons/s1-structured-streaming-state/): a streaming Job wants new job cluster + unlimited retries + max concurrent runs = 1.
◆ Repair vs rerun
When one task fails, you don't rerun the whole Job. repair-run reruns only the failed task and its downstream dependents, reusing successful upstream results — cheaper and safer. run-now reruns the entire Job from the start (expensive, risks duplicate work). Always prefer repair-run for a single failed task. (The REST/CLI form is [Jobs via REST API and CLI](/lessons/s1-jobs-api-cli/); monitoring is [Operational job monitoring — REST/CLI, notifications, retry policy](/lessons/s5-job-monitoring/).)
Takeaways (rebuild it from these)
- Pipeline = what; Job = when/how. The Job wraps tasks and sequences them.
- A Job is a graph of tasks:
depends_onorders them; independent tasks run in parallel. - Control flow: Task Values (pass data, cast from string), Condition task (if/else; untaken branch skipped), For-Each (one template, parallel instances),
outcome:"failed"(cleanup/alerts). - New job cluster for production (isolated, pay-per-run); all-purpose is dev-only. Streaming recovery = new job cluster + unlimited retries + max concurrent runs 1.
repair-runreruns only failed + downstream (preferred);run-nowreruns everything. No automatic cross-task rollback.
Before you move on — say these without scrolling up
- Pipeline vs Job — which is "what," which is "when/how"?
- B and C both depend only on A — when do they run?
- C fails after A and B succeeded — what's committed, and what does NOT happen?
- One task failed —
repair-runorrun-now, and what's the difference?
Next: the same Jobs, driven programmatically — the REST API and CLI the exam quizzes verbatim. → [Jobs via REST API and CLI](/lessons/s1-jobs-api-cli/)