Lessons

Debugging & Deploying

Git Folders & CI/CD — version control inside the workspace

Use Git Folders for branch-based development, collaboration, and CI/CD; understand the notebook source format and how it enables testing and version control.

Two analysts edit the same Databricks notebook, relying on built-in revision history. It works until one overwrites the other — no branches, no review, no way back. Real teams solved this with Git; Git Folders bring that discipline inside the workspace.


The spine

Beat 1 — the anchor: a Git client in the workspace

Anchor. A Git Folder is Databricks' built-in visual Git client: clone a remote repo into the workspace and develop notebooks/files with the full Git workflow — branches, commits, push/pull, merges, diffs. Version control, isolated collaboration, and the on-ramp to CI/CD. (Git Folders were called Repos — the exam uses both.)

Beat 2 — collaboration = branches, not overwrites

Predict: you can't push straight to main, and you don't want to touch anyone else's work. How do you share your changes?

Feature branch → commit → push → open a pull request. Each engineer works on their own branch (often in their own Git folder mapped to the same repo), isolated until merged. Two gotchas:

Lock it. Feature branch → commit → push → PR; rarely push to main. Missing branch → pull; conflicts → resolve in the UI.


The dials (skim now; return when a question needs one)

◆ The magic first line — how a .py file is a notebook

Open a Databricks Python notebook as plain text and the first line is # Databricks notebook source. That magic comment is what marks a plain source file (.py, .sql, .scala, .r) as a Databricks notebook ("source format") — why notebooks live in git as reviewable, diffable text. Contrast .ipynb (Jupyter): heavier, but it preserves outputs and dashboard/visualization definitions that source format drops. Tells: "what makes a .py a notebook" → the # Databricks notebook source first line; "keep dashboards/outputs in git" → .ipynb.

◆ Why this unlocks testing

Recall [Unit & integration testing on Databricks](/lessons/s1-testing/): to run pytest, functions must live in importable .py modules, not be trapped in notebooks. Git Folders let you keep arbitrary .py files alongside notebooks — factor logic into modules, import, unit-test (with the sys.path care from [Python project structure for Databricks Asset Bundles](/lessons/s1-dabs-project/)). Version control + testability together. Tell: "unit test functions in the workspace" → define + test functions in Files in Git Folders/Repos.

◆ CI/CD — two paths (name the collision)

Git Folder syncs files/code; a bundle deploys resources. "Deploy just the notebooks" → production Git folder; "deploy the jobs and pipelines as code" → a bundle.

Takeaways (rebuild it from these)

  1. Git Folder (was Repos) = the workspace's built-in Git client: clone, branch, commit, push/pull, merge, diff.
  2. Collaboration = feature branch → commit → push → PR; rarely push to main. Missing branch → pull; conflicts → resolve in the UI.
  3. # Databricks notebook source (first line) marks a plain .py/.sql/… file as a notebook (source format). .ipynb preserves outputs/dashboards.
  4. Git Folders let you keep importable .py modules → makes pytest-style unit testing possible ([Unit & integration testing on Databricks](/lessons/s1-testing/)).
  5. CI/CD: bundles (deploy resources + code) vs a production Git folder (code-only, synced via GitHub Actions/scheduled job). Git folder = files; bundle = resources.

Before you move on — say these without scrolling up

  1. Share your work without pushing to main or touching others — the workflow?
  2. Expected branch missing from the dropdown — what do you do?
  3. What single line makes a .py file a Databricks notebook — and when do you use .ipynb instead?
  4. Production Git folder vs bundle — which deploys files, which deploys resources?

That completes Section 9: version control (Git Folders) → automated deployment (bundles).

Prerequisites