Create and automate ETL workloads using Jobs via UI/APIs/CLI; create pipeline components that use control-flow operators (if/else, for-each).

We've built the transformations — Structured Streaming, LSDP, AUTO CDC. This lesson is the thing that runs them: on a schedule, in the right order, recovering when a piece fails. It's a big slice of the exam, but it collapses to one clean distinction.

The spine

Beat 1 — Pipeline vs Job (the anchor)

Two words get blurred; separate them first. Recall from [Lakeflow Spark Declarative Pipelines](/lessons/s1-lakeflow-sdp/) that a pipeline is the transformation logic — the @dlt.table declarations (or notebook code) that say what happens to data. A Lakeflow Job (formerly "Workflow") is the operational wrapper — it says when and how that logic runs.

Anchor. Pipeline = what happens to data. Job = when and how it runs (schedule, cluster, retries, task order, alerts). Every orchestration feature below is just the Job answering "when and how."

A subtlety to bank: a Job has no "declarative vs imperative" flavour — that lives inside the task (a task can run an LSDP pipeline or a notebook). The Job just wraps and sequences tasks.

Beat 2 — a Job is a graph of tasks

A real Job is rarely one step. It's a multi-task Job — tasks wired into a dependency graph (the same graph idea as LSDP in [Lakeflow Spark Declarative Pipelines](/lessons/s1-lakeflow-sdp/), but now at the task level, and you wire it explicitly):

depends_on — task B starts only after task A succeeds. This enforces order: ingest → silver → gold.
Parallel tasks — tasks with no mutual dependency run simultaneously. If B and C both depend only on A, they run in parallel once A finishes.

Ground it in the restaurant pipeline from [The one job — and the two axes everything lives on](/lessons/f1-the-one-job/): an ingest task, then silver (depends_on: ingest), then gold (depends_on: silver) — a chain the Job runs top to bottom, restarting nothing that already succeeded.

Lock it. Job = a graph of tasks; depends_on orders them; independent tasks run in parallel.

Beat 3 — the surprise: what rolls back when a task fails?

Here's the buried doubt, and the official sample question tests it directly. A Job has tasks A → (B, C in parallel). A and B succeed; C fails.

Predict: what's the state of the data now? Does the Job roll back because one task failed?

…

A and B's work is fully committed, and some of C's operations may have already completed. There is no automatic cross-task rollback. A Job is a dependency graph for orchestration, not a single database transaction — each task commits its own work as it goes. (Recall [How Delta Lake works — the transaction log](/lessons/f2-delta-transaction-log/): atomicity is per Delta commit, not per Job. A task's individual writes are atomic; the Job as a whole is not.) So "because C failed, everything rolls back" is always wrong.

Lock it. No cross-task rollback. Succeeded tasks stay committed; a failed task may have partially completed.

The dials (skim now; return when a question needs one)

◆ Control flow — passing data and branching

The exam objective names control flow (if/else, for-each). These are Job-level operators:

Task Values — one task hands a small value to another: dbutils.jobs.taskValues.set(key, value) upstream, .get(taskKey, key) downstream. Gotcha: task values come back as strings — cast to int/float yourself.
Condition task (if/else) — routes on a boolean. Classic: a validate task sets a quality_score task value; a condition task reads it — > 0.95 → process_good, else → quarantine_bad. The branch not taken is skipped, not failed.
For-Each task — one task template run over a list, instances in parallel. Trigger with ["US","EU","APAC"] → three parallel instances, each reading its slice via dbutils.widgets.get("region"). Scales to 50 tenants without 50 tasks. (Set a concurrency limit.)
outcome: "failed" dependency — a downstream task that runs only when its upstream failed (alerts, cleanup).

◆ Which cluster does a Job run on?

Match the cluster to the work:

Cluster type	Use for	Why
New job cluster	production batch jobs	fresh, isolated, terminates after the run → pay only during execution
All-purpose cluster	interactive development only	shared, always-on → wasteful and un-isolated for production
Serverless	lightweight/bursty tasks, SQL warehouse tasks	auto-scales, no provisioning overhead
Instance pool	frequent short jobs needing fast startup	pre-warmed VMs → sub-minute startup

Tell: "production job, lowest cost" → new job cluster, never all-purpose. And recall the streaming recovery config from [Structured Streaming & the state model](/lessons/s1-structured-streaming-state/): a streaming Job wants new job cluster + unlimited retries + max concurrent runs = 1.

◆ Repair vs rerun

When one task fails, you don't rerun the whole Job. repair-run reruns only the failed task and its downstream dependents, reusing successful upstream results — cheaper and safer. run-now reruns the entire Job from the start (expensive, risks duplicate work). Always prefer repair-run for a single failed task. (The REST/CLI form is [Jobs via REST API and CLI](/lessons/s1-jobs-api-cli/); monitoring is [Operational job monitoring — REST/CLI, notifications, retry policy](/lessons/s5-job-monitoring/).)

Takeaways (rebuild it from these)

Pipeline = what; Job = when/how. The Job wraps tasks and sequences them.
A Job is a graph of tasks: depends_on orders them; independent tasks run in parallel.
Control flow: Task Values (pass data, cast from string), Condition task (if/else; untaken branch skipped), For-Each (one template, parallel instances), outcome:"failed" (cleanup/alerts).
New job cluster for production (isolated, pay-per-run); all-purpose is dev-only. Streaming recovery = new job cluster + unlimited retries + max concurrent runs 1.
repair-run reruns only failed + downstream (preferred); run-now reruns everything. No automatic cross-task rollback.

Before you move on — say these without scrolling up

Pipeline vs Job — which is "what," which is "when/how"?
B and C both depend only on A — when do they run?
C fails after A and B succeeded — what's committed, and what does NOT happen?
One task failed — repair-run or run-now, and what's the difference?

Next: the same Jobs, driven programmatically — the REST API and CLI the exam quizzes verbatim. → [Jobs via REST API and CLI](/lessons/s1-jobs-api-cli/)

Jobs & orchestration — multi-task, dependencies, control flow