Delta Sharing was data going out. This lesson is the mirror image — data coming in. Your org's data isn't all in Databricks: some lives in PostgreSQL, some in Snowflake, some in an Oracle box the trading desk owns. You want to query it from Databricks — join it to your lakehouse tables — without an ETL that copies it all in first. That's Lakehouse Federation.
The spine
Beat 1 — the anchor
Predict: to join Oracle portfolio data to your lakehouse market data, the "obvious" move is to ETL Oracle into Delta first. What does Federation let you skip?
…
The copy entirely. You reach the source where it lives:
Anchor. Lakehouse Federation lets you query external data sources — databases, warehouses, other lakes — in place, directly, through Unity Catalog, with no migration or copy. The external data stays put; you just reach it under UC governance.
Beat 2 — the setup: connection → foreign catalog → grant
Two objects, in order — and the order is tested:
- A connection — how to reach the external system (host + credentials). One per source (e.g.
mysql_connection). - A foreign catalog — a UC catalog that mirrors the external system, so its tables appear in your namespace and you query them with normal SQL under UC.
Predict: you've created
mysql_connection. What's the next step?
…
Create the foreign catalog from that connection — then grant UC permissions on the foreign catalog / schemas / tables so teams can use it. (Connection first, foreign catalog second, governance third — every question tests this sequence.)
Lock it. Federation = query external sources in place, through UC, no copy. Setup = connection → foreign catalog → grant.
The dials (skim now; return when a question needs one)
◆ Read-only and always current
Federation hits the source system directly, read-only. Two consequences the exam raises:
- Consistency / no staleness: with no copy, results always reflect the current state of the source — nothing to go stale, no sync job to fall behind. "Worried about consistency/staleness across teams" → federation reads live, read-only, in place.
- It's for querying external data (and joining it to lakehouse data), not for writing back or replacing ingestion of data you truly own.
The whole-feature tell: "query Oracle / Snowflake / SQL Server / Postgres from Databricks without moving the data" → Lakehouse Federation.
◆ Name the collision (with Delta Sharing)
- Delta Sharing (
[Delta Sharing — live data out, without copies](/lessons/s4-delta-sharing/)) — share your data out to a recipient, in place. - Lakehouse Federation — query someone else's / another system's data in, in place.
Both honour "don't copy the data." Providing data to a partner → Sharing; reaching into an external database → Federation.
Takeaways (rebuild it from these)
- Lakehouse Federation = query external sources in place, through UC, no copy/migration.
- Setup order: connection (reach the source) → foreign catalog (mirror it into UC) → grant UC permissions. Query with normal SQL.
- Read-only and live — results always reflect the current source, so no staleness/consistency gap and no ETL to maintain.
- Collision: Federation = data in (query external); Delta Sharing = data out (share yours). Both avoid copying.
Before you move on — say these without scrolling up
- What does Federation let you skip versus the "ETL it in first" approach?
- You've made the connection — the next two steps, in order?
- "Worried about staleness across teams querying external data" — why is Federation the answer?
- Providing data to a partner vs reaching into an external DB — which feature each?
That closes Section 4 — the no-copy pair. Next: monitoring and alerting on everything the pipelines produce.