You've built pipelines ([Lakeflow Spark Declarative Pipelines](/lessons/s1-lakeflow-sdp/)), jobs ([Jobs & orchestration — multi-task, dependencies, control flow](/lessons/s1-jobs-orchestration/)), and their dependencies ([Managing third-party libraries](/lessons/s1-third-party-libs/)). This lesson packages all of it as code in one versioned project so it deploys the same way to dev and prod — the foundation of CI/CD.
The spine
Beat 1 — the pain, then the anchor
Naming caveat up front (renamed-product trap): current docs call this Declarative Automation Bundles; the Nov-2025 exam guide and questions still say Databricks Asset Bundles (DABs) — I'll say DABs to match.
Predict: without DABs, your job/pipeline config lives in the UI, hand-clicked per environment. What goes wrong across dev, uat, prod?
…
Configuration drift (dev and prod quietly diverge), no version control on the config, and manual deployment mistakes. DABs fixes all three by making your Databricks resources — jobs, pipelines, clusters — infrastructure as code:
Anchor. Define your Databricks resources once, as code, in a versioned bundle; deploy that same definition to any environment, with only the per-environment values swapped. One source of truth, many targets.
Beat 2 — how one definition serves many environments
The mechanism is targets + variable substitution. A target is a named environment (dev, uat, prod) with its own workspace URL, catalog, cluster size, identity. Variable substitution — ${var.catalog} — resolves to dev_catalog under the dev target and prod_catalog under prod.
Predict: so how do you deploy to prod without copy-pasting a whole second config?
…
You don't copy anything — the one definition stays fixed; only the target's values swap in. That's the anchor made concrete: same recipe, different ingredients per kitchen.
Lock it. One versioned definition + per-target values (
${var.…}) = one source of truth deployed to many environments. Prod runs as a service principal (recall[Jobs & orchestration — multi-task, dependencies, control flow](/lessons/s1-jobs-orchestration/)).
The dials (skim now; return when a question needs one)
◆ The project shape
my_bundle/
├── databricks.yml ← the ONE root config: bundle name, targets, includes
├── resources/ ← one YAML per job/pipeline (separate files avoid merge conflicts)
│ ├── ingest_job.yml
│ └── etl_pipeline.yml
├── src/ ← your Python package (modular, importable, testable)
│ └── my_pkg/…
└── requirements.txt / *.whl ← dependencies (from [Managing third-party libraries](/lessons/s1-third-party-libs/))
databricks.yml— exactly one, at the root. Declares the bundle name, the targets, andinclude:pulling inresources/*.yml.resources/— one YAML per job/pipeline. Splitting keeps large teams from colliding in one giant file.- Targets —
mode: developmentrelaxes settings;mode: productionenforces stricter ones (service principal).
◆ The src/ package and sys.path
Keep transformation logic in an importable Python package under src/ (built into a wheel per [Managing third-party libraries](/lessons/s1-third-party-libs/)), not pasted into notebooks — so it's modular and unit-testable ([Unit & integration testing on Databricks](/lessons/s1-testing/)). That raises the one Python-internals fact the exam asks directly:
sys.path= the list of directories Python searches when youimporta module.import sys; print(sys.path).
For import my_pkg to work, the package's location must be on sys.path — which a good bundle handles by installing your wheel (it lands on the path) rather than relying on fragile relative paths. "Which variable lists the directories searched for modules?" → sys.path.
◆ The four CLI commands (and the CI/CD order)
| Command | Does | When |
|---|---|---|
databricks bundle init | scaffolds a new project (like git init) | once, at project start |
databricks bundle validate | checks the YAML, touches no workspace | before every deploy |
databricks bundle deploy -t dev | creates/updates resources in the target | on every change |
databricks bundle run <resource> -t dev | triggers a job/pipeline to verify | after deploy |
Tell: the CI/CD sequence is validate → deploy → run — not init, because init is a one-time developer action, not part of an automated pipeline. (Deploy mechanics are [Declarative Automation Bundles — deploying Databricks as code](/lessons/s9-dabs-deploy/); the Git side is [Git Folders & CI/CD — version control inside the workspace](/lessons/s9-git-cicd/).)
Takeaways (rebuild it from these)
- DABs (current docs: Declarative Automation Bundles) = infrastructure as code: one versioned definition, deployed per-target.
databricks.yml(one, at root) +resources/(one YAML per job) + targets +${var.…}substitution +mode: production(service principal).- Logic lives in an importable
src/package (a wheel), which is whysys.pathmatters and what makes the code unit-testable. - CLI:
init(once) ·validate(no workspace) ·deploy -t·run -t. CI/CD = validate → deploy → run (neverinit).
Before you move on — say these without scrolling up
- Three things that go wrong without DABs — and the one idea that fixes all three.
- How does one definition deploy to prod without copy-pasting config?
sys.path— what is it, and why does a bundle care?- The CI/CD command sequence — and which command is not in it?
Next: how you prove that src/ package is correct before it ships → [Unit & integration testing on Databricks](/lessons/s1-testing/).