Lessons

Data Sharing & Federation

Delta Sharing — live data out, without copies

Share data with Delta Sharing — Databricks-to-Databricks vs open protocol, creating shares, WITH HISTORY, egress, and shareable object types.

Section 4 is one idea seen from two directions: move data without copying it. This lesson is the outbound half — you have data a partner needs. The old way (export a CSV, run an ETL into their system) is stale the moment it lands, ungoverned, and racks up transfer cost. Delta Sharing lets the recipient read your live data directly from your cloud storage, no copy made.


The spine

Beat 1 — the anchor, and the one decision

Anchor. Delta Sharing is an open protocol that lets a recipient read your data directly from your cloud storage — no copy, always live, governed by Unity Catalog. Everything else hangs on one decision: who is the recipient — another Unity Catalog org, or someone outside Databricks?

Beat 2 — the fork: who's the recipient?

Predict: Partner A runs Databricks + Unity Catalog. Partner B is a pandas/Power BI shop, no Databricks. Can you share the same things to both?

No — and that's the whole exam axis:

Databricks-to-Databricks (D2D)Open sharing (D2O — open protocol)
Recipientanother Unity Catalog Databricks organy tool/platform (non-Databricks)
Can sharetables + notebooks, volumes, ML modelsDelta tables only
Identifies recipient bytheir sharing identifiera credential/token file

Tells: "share with a non-Databricks partner / open tools" → open (D2O), tables only. "share tables and notebooks/models with another UC org" → D2D.

Lock it. No-copy, live, UC-governed. Recipient on UC → D2D (tables + notebooks/volumes/models, via sharing identifier). Recipient off Databricks → open/D2O (Delta tables only, token file).


The dials (skim now; return when a question needs one)

◆ Creating and populating a share

You must be a metastore admin or hold CREATE SHARE (the UC privilege model, [Unity Catalog privileges — the three-level traversal and delegation](/lessons/s7-uc-privileges/)). Define a share, then add objects: CREATE SHARE … then ALTER SHARE … ADD TABLE ….

WITH HISTORY — time travel, streaming, CDF, and read speed

To let a D2D recipient do time travel, streaming reads, or read the Change Data Feed on a shared table — and for better read performance — share it WITH HISTORY:

ALTER SHARE sales_share ADD TABLE products WITH HISTORY;

It shares the table's version history so the recipient can query past versions and stream. Tell: "recipient needs time travel / streaming / CDF" → WITH HISTORY. And for optimal read performance specifically, the tested combo is WITH HISTORY + enable CDF + no partitioning on the shared table.

◆ How open sharing reaches the bytes (the mechanism)

An open-sharing recipient reading a table without history still reads straight from your storage — via temporary, scoped-down security credentials from the cloud storage, restricted to the root directory of the shared Delta table. So they get short-lived, least-privilege access to exactly that table's files, nothing else.

◆ Egress cost — and the R2 trick

The recipient pulls bytes directly from the provider's storage, so a cross-cloud or cross-region share incurs egress fees (the provider's cloud charges outbound transfer). Tested mitigation: store the shared dataset in Cloudflare R2, which charges zero egress, before sharing widely across AWS/Azure/GCP. Tell: "minimize/eliminate egress cost sharing across clouds" → R2.

◆ Interop cousin — Delta UniForm

Adjacent to sharing: Delta UniForm makes a Delta table readable by Iceberg (and Hudi) tools by generating their metadata alongside the Delta table — an Iceberg-only tool reads your Delta table with no copy or conversion. Tell: "let external Iceberg tools read this Delta table" → enable UniForm to iceberg.

◆ Name the collision (with the next lesson)

Same "no copy" principle, opposite directions.

Takeaways (rebuild it from these)

  1. Delta Sharing = open protocol, live data, no copy, UC-governed. Decision axis: who is the recipient.
  2. D2D (recipient on UC) shares tables + notebooks/volumes/models, via the recipient's sharing identifier (their metastore ref), no token. Open/D2O reaches non-Databricks tools but is Delta tables only (token file).
  3. Creating shares needs metastore admin / CREATE SHARE; add tables with ALTER SHARE … ADD TABLE.
  4. WITH HISTORY enables time travel / streaming / CDF (best read perf = WITH HISTORY + CDF + no partitioning). Open sharing serves bytes via temporary scoped credentials to the table's root dir.
  5. Cross-cloud/region = egress cost; Cloudflare R2 (zero egress) avoids it. Delta UniForm = let Iceberg/Hudi tools read a Delta table.

Before you move on — say these without scrolling up

  1. The one decision that drives every Delta Sharing question.
  2. Partner is on UC vs not — what can you share to each, and how is the recipient identified?
  3. Recipient needs streaming + CDF + time travel on a shared table — what clause, and what else helps read speed?
  4. Sharing across clouds runs up cost — what is it, and the fix?

Next: flip the direction — querying external data into Databricks without copying it → [Lakehouse Federation — query external data in place](/lessons/s4-federation/).

Prerequisites

Leads to