Design and implement a scalable Python project structure optimized for Databricks Asset Bundles (DABs), enabling modular development, deployment automation, and CI/CD integration.

You've built pipelines ([Lakeflow Spark Declarative Pipelines](/lessons/s1-lakeflow-sdp/)), jobs ([Jobs & orchestration — multi-task, dependencies, control flow](/lessons/s1-jobs-orchestration/)), and their dependencies ([Managing third-party libraries](/lessons/s1-third-party-libs/)). This lesson packages all of it as code in one versioned project so it deploys the same way to dev and prod — the foundation of CI/CD.

The spine

Beat 1 — the pain, then the anchor

Naming caveat up front (renamed-product trap): current docs call this Declarative Automation Bundles; the Nov-2025 exam guide and questions still say Databricks Asset Bundles (DABs) — I'll say DABs to match.

Predict: without DABs, your job/pipeline config lives in the UI, hand-clicked per environment. What goes wrong across dev, uat, prod?

…

Configuration drift (dev and prod quietly diverge), no version control on the config, and manual deployment mistakes. DABs fixes all three by making your Databricks resources — jobs, pipelines, clusters — infrastructure as code:

Anchor. Define your Databricks resources once, as code, in a versioned bundle; deploy that same definition to any environment, with only the per-environment values swapped. One source of truth, many targets.

Beat 2 — how one definition serves many environments

The mechanism is targets + variable substitution. A target is a named environment (dev, uat, prod) with its own workspace URL, catalog, cluster size, identity. Variable substitution — ${var.catalog} — resolves to dev_catalog under the dev target and prod_catalog under prod.

Predict: so how do you deploy to prod without copy-pasting a whole second config?

…

You don't copy anything — the one definition stays fixed; only the target's values swap in. That's the anchor made concrete: same recipe, different ingredients per kitchen.

Lock it. One versioned definition + per-target values (${var.…}) = one source of truth deployed to many environments. Prod runs as a service principal (recall [Jobs & orchestration — multi-task, dependencies, control flow](/lessons/s1-jobs-orchestration/)).

The dials (skim now; return when a question needs one)

◆ The project shape

my_bundle/
├── databricks.yml            ← the ONE root config: bundle name, targets, includes
├── resources/                ← one YAML per job/pipeline (separate files avoid merge conflicts)
│    ├── ingest_job.yml
│    └── etl_pipeline.yml
├── src/                      ← your Python package (modular, importable, testable)
│    └── my_pkg/…
└── requirements.txt / *.whl  ← dependencies (from [Managing third-party libraries](/lessons/s1-third-party-libs/))

databricks.yml — exactly one, at the root. Declares the bundle name, the targets, and include: pulling in resources/*.yml.
resources/ — one YAML per job/pipeline. Splitting keeps large teams from colliding in one giant file.
Targets — mode: development relaxes settings; mode: production enforces stricter ones (service principal).

◆ The `src/` package and `sys.path`

Keep transformation logic in an importable Python package under src/ (built into a wheel per [Managing third-party libraries](/lessons/s1-third-party-libs/)), not pasted into notebooks — so it's modular and unit-testable ([Unit & integration testing on Databricks](/lessons/s1-testing/)). That raises the one Python-internals fact the exam asks directly:

sys.path = the list of directories Python searches when you import a module. import sys; print(sys.path).

For import my_pkg to work, the package's location must be on sys.path — which a good bundle handles by installing your wheel (it lands on the path) rather than relying on fragile relative paths. "Which variable lists the directories searched for modules?" → sys.path.

◆ The four CLI commands (and the CI/CD order)

Command	Does	When
`databricks bundle init`	scaffolds a new project (like `git init`)	once, at project start
`databricks bundle validate`	checks the YAML, touches no workspace	before every deploy
`databricks bundle deploy -t dev`	creates/updates resources in the target	on every change
`databricks bundle run <resource> -t dev`	triggers a job/pipeline to verify	after deploy

Tell: the CI/CD sequence is validate → deploy → run — not init, because init is a one-time developer action, not part of an automated pipeline. (Deploy mechanics are [Declarative Automation Bundles — deploying Databricks as code](/lessons/s9-dabs-deploy/); the Git side is [Git Folders & CI/CD — version control inside the workspace](/lessons/s9-git-cicd/).)

Takeaways (rebuild it from these)

DABs (current docs: Declarative Automation Bundles) = infrastructure as code: one versioned definition, deployed per-target.
databricks.yml (one, at root) + resources/ (one YAML per job) + targets + ${var.…} substitution + mode: production (service principal).
Logic lives in an importable src/ package (a wheel), which is why sys.path matters and what makes the code unit-testable.
CLI: init (once) · validate (no workspace) · deploy -t · run -t. CI/CD = validate → deploy → run (never init).

Before you move on — say these without scrolling up

Three things that go wrong without DABs — and the one idea that fixes all three.
How does one definition deploy to prod without copy-pasting config?
sys.path — what is it, and why does a bundle care?
The CI/CD command sequence — and which command is not in it?

Next: how you prove that src/ package is correct before it ships → [Unit & integration testing on Databricks](/lessons/s1-testing/).

Python project structure for Databricks Asset Bundles