Lessons

Data Governance

Discoverability & metadata — comments, tags, and DESCRIBE

Make data discoverable and documented with comments, tags, AI-generated comments, and the metadata inspection commands (DESCRIBE EXTENDED).

Governance isn't only who can touch the data — it's people being able to find and understand it. A catalog with thousands of undocumented tables is technically governed and practically useless. This lesson is the toolkit for that second half: document it (comments), classify it (tags), and inspect it (DESCRIBE).


The spine

Beat 1 — the anchor: describe, comment, tag

Anchor. Discoverability metadata is three moves — comments (human descriptions on tables/columns), tags (queryable key–value labels for classification, e.g. PII), and the inspection commands that read it all back. Every question here is one of those three.

Beat 2 — DESCRIBE EXTENDED shows everything at once

Predict: you need one command to confirm a table's column comments and a contains_pii property and a CHECK constraint. Which?

DESCRIBE EXTENDED (= DESCRIBE TABLE EXTENDED) — it returns the complete picture in one output. The narrower commands each show only a slice:

CommandShowsMisses
DESCRIBE EXTENDEDcolumns + comments, table comment, properties, constraints— (the complete view)
SHOW TBLPROPERTIESjust the propertiesevery comment
DESCRIBE DETAILfile/format/location detailscolumn comments, custom properties
DESCRIBE HISTORYthe version log ([How Delta Lake works — the transaction log](/lessons/f2-delta-transaction-log/))schema annotations

Lock it. "Confirm comments + properties + constraints together" → DESCRIBE EXTENDED. The others are slices.


The dials (skim now; return when a question needs one)

◆ Comments — and AI-generated comments at scale

Document a table/column with COMMENT (CREATE TABLE payments … COMMENT 'settled payments', or column comments). Writing hundreds by hand is the bottleneck — so Catalog Explorer offers AI-generated comments (the "AI Generate" option): an LLM inspects column names, types, and sample values and drafts descriptions you review and accept. Tell: "improve discoverability across hundreds of tables with minimal manual effort" → AI-generated comments in Catalog Explorer.

◆ Tags — classification you can query

Tags are key–value labels for governance classification (mark PII, domain, sensitivity). Syntax for multiple tags:

ALTER TABLE t SET TAGS ('key1' = 'value1', 'key2' = 'value2');

Plural TAGS with parentheses for a set of key = value pairs (a single tag can also be set). It's programmatic (works inside an automated ETL step) and queryable, so "all PII tables" becomes a metadata query.

◆ Two more governance facts

Takeaways (rebuild it from these)

  1. Discoverability = comments (describe) + tags (classify) + DESCRIBE (inspect).
  2. DESCRIBE EXTENDED (= DESCRIBE TABLE EXTENDED) shows column + table comments, properties, and constraints together; SHOW TBLPROPERTIES / DESCRIBE DETAIL / DESCRIBE HISTORY each show only a slice.
  3. AI-generated comments (Catalog Explorer "AI Generate") draft descriptions across many tables — the scale answer for documentation.
  4. ALTER TABLE t SET TAGS ('k'='v', …) — programmatic, queryable classification (e.g. PII).
  5. RENAME changes only the metastore pointer (files untouched); govern shared policy with one central UDF, not per-team copies.

Before you move on — say these without scrolling up

  1. One command to confirm comments + a property + a constraint together — which, and why not the others?
  2. Document hundreds of tables with minimal effort — what feature?
  3. The multi-tag syntax, and two things tags let you do (automated + queryable).
  4. RENAME a table — what happens to the data files?

That completes Section 8's governance story: how grants cascade (inheritance) → make the governed data discoverable (metadata).

Prerequisites