Lessons

Developing Code for Data Processing

Reference card — job parameters & secrets in notebooks

Understand the notebook development environment, variable management, and creating secure, configurable code.

Reference card (not a deep concept). One idea: a notebook is configured from outside, and each source of outside values has exactly one correct API. Three sources, three APIs — plus one security behaviour that surprises people.

Reading a parameter the Job passed in → widgets

dbutils.widgets.text("date", "null")   # declare the widget (with a default)
date = dbutils.widgets.get("date")     # read the value the Job passed

Why the others are wrong (this exact set is tested):

AttemptWhy it fails
spark.conf.get("date")reads Spark runtime config, not job/notebook parameters
input()blocks for interactive stdin — never works in a scheduled job
sys.argv[1]for Python script tasks (spark_python_task), not notebook tasks
dbutils.notebooks.getParam("date")not a real API — the namespace is dbutils.notebook (singular), with run/exit, no getParam

Passing a value between tasks → task values (not widgets)

Recall [Jobs & orchestration — multi-task, dependencies, control flow](/lessons/s1-jobs-orchestration/): dbutils.jobs.taskValues.set(key, value) / .get(taskKey, key). Distinct from widgets (which read job-level params). Task values come back as strings — cast to int/float.

Reading a secret → dbutils.secrets (and the REDACTED surprise)

pw = dbutils.secrets.get(scope="db_creds", key="jdbc_password")
print(pw)     # prints  REDACTED  — never the real value

The behaviour to lock in (a favourite question): the retrieved secret is a real, usable string — a DB connection built with it succeeds. But Databricks auto-redacts secret values in any notebook output/logs, so printing one shows the literal REDACTED. No interactive box appears; nothing is written to DBFS. Never hardcode credentials; access is controlled by secret ACLs on the scope.

The one-line separators

Recall (say without scrolling up)

  1. Three outside sources of values, three APIs — match them.
  2. You print() a secret — what shows, and does a connection built from it still work?
  3. Why is spark.conf.get / sys.argv wrong for a notebook job param?

Prerequisites