Lessons

Security & Compliance

Secrets — storing credentials, redaction, and scope ACLs

Store and retrieve credentials with Databricks secret scopes, understand output redaction and its limits, and control access with scope-level ACLs.

A pipeline needs a database password, an API key, an OpenAI token. Put it in the notebook as a string and you've leaked it — into the code, git history, cell output, logs. Databricks secrets store the credential once, reference it at runtime, and never let it print. Simple — but the exam probes two things people get wrong: redaction is not security, and access is controlled per scope, not per key.


The spine

Beat 1 — the anchor: scope + key, read at runtime, ACL on the scope

Anchor. A secret lives in a scope (namespace) as a key → value pair. Create it once (CLI/UI), read it with dbutils.secrets.get(scope, key), and Databricks redacts the value from output. Who may read it is decided by an ACL on the scope — not the individual key. Every secrets question is one of those four moves.

databricks secrets create-scope my_scope
databricks secrets put-secret my_scope db_password   # prompts for value
db_password = dbutils.secrets.get(scope="dev", key="database_password")  # scope first, then key

The argument order (scope, then key) mirrors the storage model and is itself a tested detail.

Beat 2 — redaction, and why it isn't security

Predict: you print(db_password). What appears — and does a DB connection built from it still work?

It prints [REDACTED], but the connection still works — redaction affects only display; the value in memory is the true credential. And it's a naive string match, so it's bypassable:

for ch in db_password: print(ch)   # leaks the secret one character at a time

Lock it. dbutils.secrets.get returns a real, usable string; output shows [REDACTED]; char-by-char printing defeats it. Redaction stops accidental display, not a determined user — real protection is limiting who can read the scope.


The dials (skim now; return when a question needs one)

◆ Scope ACLs — per scope, not per key

Access is granted on the scope, escalating: READ < WRITE < MANAGE.

Because ACLs are scope-level, not key-level, least privilege has a shape: to give a team access to just their credential, put it in a dedicated scope and grant READ — you can't grant read on one key inside a shared scope. Tell: "minimal access to one credential" → dedicated scope + READ. "Who can use a secret here?" → workspace admins, the creator, and anyone with READ/WRITE/MANAGE on that scope.

◆ Authenticating automation — OAuth for service principals

One layer up: how does the CLI / automated deploy authenticate? Most secure = OAuth token federation for a service principal — short-lived federated tokens from a trusted IdP, no stored long-lived secret — over a static personal access token (a long-lived credential you then must protect). The production-identity thread from [Access control — least privilege and the object permission ladders](/lessons/s7-access-control/): automation runs as a service principal, with the shortest-lived credential possible.

Takeaways (rebuild it from these)

  1. Never hardcode — store as scope + key (create-scopeput-secret); read with dbutils.secrets.get(scope, key) (scope first).
  2. Redaction shows [REDACTED] but the value is real — the connection works; and it's bypassable char-by-char. Redaction ≠ security.
  3. Real control = scope-level ACLs: READ < WRITE < MANAGE; READ is the minimum to use; creator/admin hold MANAGE.
  4. ACLs are per scope, not per key → least privilege = dedicated scope + READ for a team's credential.
  5. Automation authenticates best as a service principal via OAuth (short-lived), not a long-lived PAT.

Before you move on — say these without scrolling up

  1. The four moves every secrets question is about.
  2. print(secret) shows [REDACTED] — does the connection still work, and how is redaction defeated?
  3. ACLs are per scope or per key — and what does that mean for giving one team one credential?
  4. Best way for a CLI/deploy to authenticate?

Next: when hiding a value at display time isn't enough and you must transform the stored value itself → [Anonymization — hashing, pseudonymization, and protecting values at rest](/lessons/s7-anonymization/).

Prerequisites

Leads to