Lessons

Debugging & Deploying

Declarative Automation Bundles — deploying Databricks as code

Package and deploy Databricks resources with Declarative Automation Bundles (Databricks Asset Bundles): validate → deploy → run, targets, and binding existing jobs.

[Python project structure for Databricks Asset Bundles](/lessons/s1-dabs-project/) gave the project structure; this lesson is the deploy. You built a job by clicking through the Jobs UI in dev. Now you need the exact same job — tasks, schedule, cluster — in staging and prod, versioned so every change is reviewed and reversible.

Predict: the naive way is to rebuild it by hand in three workspaces. What three things does that cost you?

Slow, error-prone, and no history (no review, no rollback). That's the problem bundles solve.


The spine

Beat 1 — the anchor: resources as YAML, deployed to any target

Anchor. A bundle declares your Databricks resources — jobs, pipelines, apps — and their config as YAML in your repo, and deploys that single definition to any target workspace. Infrastructure-as-code for Databricks: the source of truth is a file in git, not clicks in a workspace. (Current name Declarative Automation Bundles; the exam says Databricks Asset Bundles — same thing, acronym still DAB.)

The databricks.yml at the root defines: the bundle (name/settings), its resources (jobs/pipelines/apps/volumes — a resource can even carry a permissions mapping, so ACLs are code too, [Access control — least privilege and the object permission ladders](/lessons/s7-access-control/)), and its targets (named environments: dev, staging, prod). Resources say what exists; targets say where and how it deploys.

Beat 2 — the lifecycle, and the CI/CD trap

CommandDoesNote
databricks bundle initscaffold from a templateone-time only
databricks bundle validatecheck config is well-formedbefore every deploy
databricks bundle deploy -t <target>push resources to the target
databricks bundle run <key> -t <target>run a bundle job/pipeline in that target

Predict: which of these four is not part of a CI/CD pipeline?

init — the bundle already lives in the repo, and re-running init would overwrite/reset it. So CI/CD = validatedeployrun (never init).

Lock it. Bundle = resources+config as YAML deployed per target (-t selects the environment; dev mode isolates/pauses, prod deploys as-is). CI/CD sequence = validate → deploy → run.


The dials (skim now; return when a question needs one)

◆ Adopt an EXISTING job — generate, then bind

The scenario that trips people: a production job built in the UI, and you want to manage it as code without recreating it or making a duplicate. Two steps:

  1. databricks bundle generate job --existing-job-id <id> — capture the live job's config into YAML (+ download referenced files) into your bundle.
  2. databricks bundle deployment bind <bundle_job> <remote-job-id> — link the bundle resource to the existing remote job by id, so future deploys update the real job instead of creating a second one.

Tell: "adopt an existing job into a bundle without losing/duplicating it" → generate, then bind.

◆ CI/CD identity, and where bundles sit

In an automated pipeline (GitHub Actions), authenticate as a service principal via OAuth token federation — short-lived, no stored long-lived secret — not a PAT ([Secrets — storing credentials, redaction, and scope ACLs](/lessons/s7-secrets/)). Bundles are the recommended CI/CD path; the lighter alternative deploys code only — a production Git folder ([Git Folders & CI/CD — version control inside the workspace](/lessons/s9-git-cicd/)). Bundles manage resources; a Git folder just syncs files.

Takeaways (rebuild it from these)

  1. A bundle = Databricks resources + config as versioned YAML (databricks.yml), deployed to any target. Current name Declarative Automation Bundles; exam says Databricks Asset Bundles.
  2. Lifecycle: init (one-time) → validatedeploy -trun -t. CI/CD = validate → deploy → run (never init).
  3. Targets = named environments; -t/--target picks one; dev mode isolates/pauses, prod deploys as-is.
  4. Adopt an existing job: bundle generate (capture to YAML) → bundle deployment bind (link by id → deploys update, don't duplicate).
  5. Bundles carry permissions (ACLs as code) + variables; CI/CD authenticates as a service principal via OAuth.

Before you move on — say these without scrolling up

  1. What a bundle declares, and what a "target" is.
  2. The four lifecycle commands — and which is not in CI/CD, and why.
  3. Adopt a UI-built job into a bundle without duplicating it — the two commands.
  4. Bundle vs production Git folder — what does each deploy?

Next: the version-control layer underneath — Git Folders, branching, and how a .py file becomes a notebook → [Git Folders & CI/CD — version control inside the workspace](/lessons/s9-git-cicd/).

Prerequisites

Leads to