Unity Catalog Flashcards
4 key areas of Data Governance
- Data Access Control
- Data Access Audit
- Data Lineage
- Data Discovery
Unity Catalog capabilities
- Centralised metadata and user management
- Centralised data access controls
- Data access auditing
- Data lineage
- Data search and discovery
- Secure data with Delta Sharing
Data Governance with vs without Unity Catalog
Centralised access control, auditing, lineage, and data discovery across Databricks workspaces.
Unity Catalog object model
- Metastore: top-level container for metadata. Registers metadata about data and AI assets and permissions that govern access to them.
- Catalog: used to organise your data and used as the top level in your data isolation schema. Often split by organisational units or software development lifecycle scopes.
- Schema (i.e. database): contain tables, views, volumes, AI models and functions. Organise data and AI assets into logical categories
Unity Catalog three-level namespace
- Traditional SQL is managed at the two-level namespace, i.e. SELECT * FROM schema.table
- Unity Catalog manages at the three-level namespace, i.e. SELECT * FROM catalog.schema.table
List the 6 types of principals in Unity Catalog
- Cloud Admin: manage underlying cloud resources
- Identity Admin: manage users and groups and provision to the account
- Account Admin: create or delete metastores, manage users and groups, full access to all data objects
- Metastore Admin: create or drop, grant privileges on, and change ownership of catalogs (and below)
- Data Owner: owns data objects they created
- Workspace Admin: manages permissions on workspace assets, restricts access to cluster creation, add/removes users, grants privileges etc
List the 5 Identities in Unity Catalog
- Users
- Account Administrators
- Service Principals
- Service Principals with administrative privileges
- Groups (e.g. “allusers” includes “analysts” and “developers”)
Privileges for Metastore
CREATE CATALOG
CREATE EXTERNAL LOCATION
CREATE SHARE
CREATE RECIPIENT
CREATE PROVIDER
Privileges for Catalog
USE CATALOG
CREATE SCHEMA
Privileges for Schema
USE SCHEMA
CREATE TABLE
CREATE FUNCTION
Privileges for Table
SELECT
MODIFY
Privileges for View
SELECT
Privileges for External Location
CREATE EXTERNAL TABLE
READ FILES
WRITE FILES
CREATE MANAGED STORAGE
Privileges for Storage credential
CREATE EXTERNAL TABLE
READ FILES
WRITE FILES
CREATE EXTERNAL LOCATION
Privileges for Function
EXECUTE
Dynamic Views (3)
- Limit access to columns (omit column values from output)
- Limit access to rows (omit rows from output)
- Data masking (obscure data in certain fields)
Unity Catalog Store Credential
- Enables Unity Catalog to connect to an external cloud storage (e.g. IAM role for AWS S3, Service Principal for Azure Storage)
Unity Catalog External Location
Cloud storage path + storage credential
- Self-contained object for accessing specific locations in cloud storage
- Fine-grained control over external storage
Managing Owner Permissions (sql)
ALTER SCHEMA schema_name OWNER TO username
ALTER TABLE table_name OWNER TO username
ALTER VIEW view_name OWNER TO username
ALTER FUNCTION function_name TO username
- Only owner can grant access to an object
- User who creates the object becomes its owner, regardless of the owner of the parent object
Revoking Permissions
REVOKE [privilege_type] ON [data_object_type] [data_object_name] FROM [user_or_group_name]
e.g.
REVOKE ALL PRIVILEGES ON SCHEMA default FROM alf@melmak.et
;
REVOKE SELECT ON TABLE t FROM aliens;
Grant Permissions
GRANT privilege_types ON securable_object TO principal
GRANT CREATE ON SCHEMA <schema-name> TO `alf@melmak.et`;
GRANT ALL PRIVILEGES ON TABLE forecasts TO finance;
GRANT SELECT ON TABLE sample_data TO USERS;</schema-name>
Unity Catalog best practices
- Only 1 metastore per geographic region (share data between regions with Delta Sharing)
- Use catalogs (not metastores) to segregate data, then apply permissions to groups on certain catalogs (USAGE on catalog -> USAGE on schema -> SELECT on Tables)
- Segregate data by Catalogs. E.g. business unit + environment scope
- Manage all identities at the account level
- Use groups rather than users to assign access and ownership to objects
- Use Service Principals to run production jobs