Two analysts edit the same Databricks notebook, relying on built-in revision history. It works until one overwrites the other — no branches, no review, no way back. Real teams solved this with Git; Git Folders bring that discipline inside the workspace.
The spine
Beat 1 — the anchor: a Git client in the workspace
Anchor. A Git Folder is Databricks' built-in visual Git client: clone a remote repo into the workspace and develop notebooks/files with the full Git workflow — branches, commits, push/pull, merges, diffs. Version control, isolated collaboration, and the on-ramp to CI/CD. (Git Folders were called Repos — the exam uses both.)
Beat 2 — collaboration = branches, not overwrites
Predict: you can't push straight to
main, and you don't want to touch anyone else's work. How do you share your changes?
…
Feature branch → commit → push → open a pull request. Each engineer works on their own branch (often in their own Git folder mapped to the same repo), isolated until merged. Two gotchas:
- A branch isn't in the dropdown? Your local Git folder hasn't fetched it — pull from the remote to refresh the branch list.
- Merge conflict? Resolve it in the Git Folders UI — manually edit out the
<<<</====/>>>>markers (or accept incoming/current), then mark resolved.
Lock it. Feature branch → commit → push → PR; rarely push to
main. Missing branch → pull; conflicts → resolve in the UI.
The dials (skim now; return when a question needs one)
◆ The magic first line — how a .py file is a notebook
Open a Databricks Python notebook as plain text and the first line is # Databricks notebook source. That magic comment is what marks a plain source file (.py, .sql, .scala, .r) as a Databricks notebook ("source format") — why notebooks live in git as reviewable, diffable text. Contrast .ipynb (Jupyter): heavier, but it preserves outputs and dashboard/visualization definitions that source format drops. Tells: "what makes a .py a notebook" → the # Databricks notebook source first line; "keep dashboards/outputs in git" → .ipynb.
◆ Why this unlocks testing
Recall [Unit & integration testing on Databricks](/lessons/s1-testing/): to run pytest, functions must live in importable .py modules, not be trapped in notebooks. Git Folders let you keep arbitrary .py files alongside notebooks — factor logic into modules, import, unit-test (with the sys.path care from [Python project structure for Databricks Asset Bundles](/lessons/s1-dabs-project/)). Version control + testability together. Tell: "unit test functions in the workspace" → define + test functions in Files in Git Folders/Repos.
◆ CI/CD — two paths (name the collision)
- Declarative Automation Bundles (recommended) — deploy resources (jobs, pipelines) and code as one versioned unit (
[Declarative Automation Bundles — deploying Databricks as code](/lessons/s9-dabs-deploy/)). The primary path. - Production Git folder (code-only) — an admin creates a top-level folder cloned to a branch; a GitHub Action on merge (or scheduled job) updates the Git folder to the latest commit. No resource management — just synced files.
Git Folder syncs files/code; a bundle deploys resources. "Deploy just the notebooks" → production Git folder; "deploy the jobs and pipelines as code" → a bundle.
Takeaways (rebuild it from these)
- Git Folder (was Repos) = the workspace's built-in Git client: clone, branch, commit, push/pull, merge, diff.
- Collaboration = feature branch → commit → push → PR; rarely push to
main. Missing branch → pull; conflicts → resolve in the UI. # Databricks notebook source(first line) marks a plain.py/.sql/… file as a notebook (source format)..ipynbpreserves outputs/dashboards.- Git Folders let you keep importable
.pymodules → makes pytest-style unit testing possible ([Unit & integration testing on Databricks](/lessons/s1-testing/)). - CI/CD: bundles (deploy resources + code) vs a production Git folder (code-only, synced via GitHub Actions/scheduled job). Git folder = files; bundle = resources.
Before you move on — say these without scrolling up
- Share your work without pushing to
mainor touching others — the workflow? - Expected branch missing from the dropdown — what do you do?
- What single line makes a
.pyfile a Databricks notebook — and when do you use.ipynbinstead? - Production Git folder vs bundle — which deploys files, which deploys resources?
That completes Section 9: version control (Git Folders) → automated deployment (bundles).