Lessons

Security & Compliance

Row filters and column masks — access control inside a table

Enforce row-level and column-level security in Unity Catalog using row filters and column masks (SQL UDFs), and contrast with dynamic views.

SELECT is blunt: grant it and the user sees every row and every column value; withhold it and they see nothing. But real policy is finer — "sales reps see only their region," "everyone but HR sees a masked SSN."

Predict: you could keep a separately-filtered copy of the table per group. Why is that a bad idea?

Data duplication nightmare — N copies to keep in sync. Unity Catalog does it in place, at query time, with two native tools.


The spine

Beat 1 — the anchor: a SQL UDF attached to the table, run per caller

Anchor. Both tools are the same mechanism: write a SQL UDF, then attach it to the table. On every query Databricks runs it as the caller's identity/group, transparently — a row filter returns a boolean (is this row visible?); a column mask returns the (possibly redacted) value for a column. No duplicated tables; the user's SQL is unchanged — results just differ by who runs it.

The most useful tell is the return type: row filter → boolean ("show this row?"); column mask → the column's value ("what should this cell read?").

Beat 2 — the two, side by side

Row filter — UDF returns BOOLEAN, attach with SET ROW FILTER … ON (cols):

CREATE FUNCTION region_filter(region STRING)
  RETURN IF(IS_ACCOUNT_GROUP_MEMBER('admin'), true, region = 'US');
ALTER TABLE sales SET ROW FILTER region_filter ON (region);

Admins get true for every row; everyone else sees only region = 'US'. Rows returning false vanish from every query — no user WHERE needed. Columns in ON (…) are passed as the function's arguments.

Column mask — UDF receives the value, returns masked-or-not, attach with ALTER COLUMN … SET MASK:

CREATE FUNCTION ssn_mask(ssn STRING)
  RETURN CASE WHEN IS_ACCOUNT_GROUP_MEMBER('HRAdmin') THEN ssn ELSE 'XXX-XX-XXXX' END;
ALTER TABLE users ALTER COLUMN ssn SET MASK ssn_mask;

HRAdmin sees the real SSN; others see the literal. A mask can even be dynamic against a mapping table — the UDF looks the user up in a group_access table to decide what to reveal.

They compose: "sales see only their region and non-admins see masked email" = a row filter on region + a column mask on email, both on the same table.

Lock it. Filter → boolean (which rows); mask → value (which cell contents). Both = a UDF attached to the table, evaluated per caller. Branch on IS_ACCOUNT_GROUP_MEMBER.


The dials (skim now; return when a question needs one)

◆ The older alternative — dynamic views (and the collision)

Before native filters/masks, a dynamic view did the same with CASE + a group function:

CREATE VIEW students_vw AS
SELECT * FROM students
WHERE CASE WHEN is_member('instructors') THEN TRUE
           ELSE is_active IS FALSE END;

Read that carefully (a favourite question): instructors get TRUE for every row → all students; non-members fall to is_active IS FALSEonly inactive students. (A column version wraps each column in CASE WHEN is_member(...) THEN col ELSE 'REDACTED' END.)

Collision — two ways to do fine-grained access:

◆ The group-function collision

IS_ACCOUNT_GROUP_MEMBER('g') = account-level membership (preferred for UC filters/masks); is_member('g') = workspace-level (the classic dynamic-view function). Same shape, different scope — recognise which the question uses.

◆ The at-rest caveat (sets up the next lesson)

Masks are query-time, not at-rest. They change what a query returns, but the files still hold the real values — anyone with direct object-storage access reads them unmasked. So masks pair with access control; where the stored value must be protected, you transform it (hashing/pseudonymization) — the next lesson.

Takeaways (rebuild it from these)

  1. Row filters/column masks = a SQL UDF attached to the table, run per query as the caller. Filter → boolean; mask → the column value. They compose.
  2. Row filter: RETURN <boolean> + SET ROW FILTER fn ON (cols). Column mask: RETURN <value> + ALTER COLUMN c SET MASK fn.
  3. Branch on IS_ACCOUNT_GROUP_MEMBER (account, preferred) vs is_member (workspace, dynamic-view classic).
  4. Dynamic view = older alternative: separate view + CASE/group-function; users must query the view and the base table must be locked down. Native filters/masks live on the table (transparent).
  5. Masks are query-time, not at-rest — protect stored values via hashing/pseudonymization ([Anonymization — hashing, pseudonymization, and protecting values at rest](/lessons/s7-anonymization/)).

Before you move on — say these without scrolling up

  1. Why not just keep a filtered copy per group — and what does UC do instead?
  2. Row filter vs column mask — what does each UDF return?
  3. Native filters/masks vs dynamic views — where does each live, and which is recommended?
  4. Why isn't a column mask enough to protect data at rest?

Next: protecting the values themselves — hashing, tokenization, and the SHA-2 details the exam checks → [Anonymization — hashing, pseudonymization, and protecting values at rest](/lessons/s7-anonymization/).

Prerequisites

Leads to