Unity Catalog Flashcards

1
Q

4 key areas of Data Governance

A
  1. Data Access Control
  2. Data Access Audit
  3. Data Lineage
  4. Data Discovery
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Unity Catalog capabilities

A
  • Centralised metadata and user management
  • Centralised data access controls
  • Data access auditing
  • Data lineage
  • Data search and discovery
  • Secure data with Delta Sharing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Governance with vs without Unity Catalog

A

Centralised access control, auditing, lineage, and data discovery across Databricks workspaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unity Catalog object model

A
  • Metastore: top-level container for metadata. Registers metadata about data and AI assets and permissions that govern access to them.
  • Catalog: used to organise your data and used as the top level in your data isolation schema. Often split by organisational units or software development lifecycle scopes.
  • Schema (i.e. database): contain tables, views, volumes, AI models and functions. Organise data and AI assets into logical categories
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Unity Catalog three-level namespace

A
  • Traditional SQL is managed at the two-level namespace, i.e. SELECT * FROM schema.table
  • Unity Catalog manages at the three-level namespace, i.e. SELECT * FROM catalog.schema.table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List the 6 types of principals in Unity Catalog

A
  1. Cloud Admin: manage underlying cloud resources
  2. Identity Admin: manage users and groups and provision to the account
  3. Account Admin: create or delete metastores, manage users and groups, full access to all data objects
  4. Metastore Admin: create or drop, grant privileges on, and change ownership of catalogs (and below)
  5. Data Owner: owns data objects they created
  6. Workspace Admin: manages permissions on workspace assets, restricts access to cluster creation, add/removes users, grants privileges etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

List the 5 Identities in Unity Catalog

A
  1. Users
  2. Account Administrators
  3. Service Principals
  4. Service Principals with administrative privileges
  5. Groups (e.g. “allusers” includes “analysts” and “developers”)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Privileges for Metastore

A

CREATE CATALOG
CREATE EXTERNAL LOCATION
CREATE SHARE
CREATE RECIPIENT
CREATE PROVIDER

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Privileges for Catalog

A

USE CATALOG
CREATE SCHEMA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Privileges for Schema

A

USE SCHEMA
CREATE TABLE
CREATE FUNCTION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Privileges for Table

A

SELECT
MODIFY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Privileges for View

A

SELECT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Privileges for External Location

A

CREATE EXTERNAL TABLE
READ FILES
WRITE FILES
CREATE MANAGED STORAGE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Privileges for Storage credential

A

CREATE EXTERNAL TABLE
READ FILES
WRITE FILES
CREATE EXTERNAL LOCATION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Privileges for Function

A

EXECUTE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Dynamic Views (3)

A
  1. Limit access to columns (omit column values from output)
  2. Limit access to rows (omit rows from output)
  3. Data masking (obscure data in certain fields)
17
Q

Unity Catalog Store Credential

A
  • Enables Unity Catalog to connect to an external cloud storage (e.g. IAM role for AWS S3, Service Principal for Azure Storage)
18
Q

Unity Catalog External Location

A

Cloud storage path + storage credential
- Self-contained object for accessing specific locations in cloud storage
- Fine-grained control over external storage

19
Q

Managing Owner Permissions (sql)

A

ALTER SCHEMA schema_name OWNER TO username
ALTER TABLE table_name OWNER TO username
ALTER VIEW view_name OWNER TO username
ALTER FUNCTION function_name TO username

  • Only owner can grant access to an object
  • User who creates the object becomes its owner, regardless of the owner of the parent object
20
Q

Revoking Permissions

A

REVOKE [privilege_type] ON [data_object_type] [data_object_name] FROM [user_or_group_name]

e.g.
REVOKE ALL PRIVILEGES ON SCHEMA default FROM alf@melmak.et;
REVOKE SELECT ON TABLE t FROM aliens;

21
Q

Grant Permissions

A

GRANT privilege_types ON securable_object TO principal

GRANT CREATE ON SCHEMA <schema-name> TO `alf@melmak.et`;
GRANT ALL PRIVILEGES ON TABLE forecasts TO finance;
GRANT SELECT ON TABLE sample_data TO USERS;</schema-name>

22
Q

Unity Catalog best practices

A
  1. Only 1 metastore per geographic region (share data between regions with Delta Sharing)
  2. Use catalogs (not metastores) to segregate data, then apply permissions to groups on certain catalogs (USAGE on catalog -> USAGE on schema -> SELECT on Tables)
  3. Segregate data by Catalogs. E.g. business unit + environment scope
  4. Manage all identities at the account level
  5. Use groups rather than users to assign access and ownership to objects
  6. Use Service Principals to run production jobs
23
Q
A