AdministraTION Flashcards

Revise the administrations concepts in databricks

1
Q

What are workspace admins?

A

Workspace admins have admin privileges within a single workspace. They can
* manage workspace-level identities,
* regulate compute use, and
* enable and delegate role-based access control (Premium plan or above only).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the effect of Unity catalog in workspace for identity management?

A

If your workspace is enabled for Unity Catalog, identities should be added at the account level. Workspace admins can then assign users, groups, and service principals to their workspace.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What compute resources Workspace admin can create?

A

Workspace admins can create SQL warehouses (a compute resource that lets you run SQL commands on data objects within Databricks SQL) and clusters for their workspace users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does Workspace admin regulate compute usage?

A

Workspace admins have the following tools:

  1. Limit workspace users’ cluster creation options with cluster policies.

Databricks recommends managing all init scripts as cluster-scoped init scripts. Instead of using global init scripts, manage init scipts using cluster policies.

  1. Learn which compute resources have Unity Catalog access.
  2. Grant S3 bucket access through clusters using instance profiles.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are different Admin types in Databricks?

A

There are two main levels of admin privileges available on the Databricks platform:

Account admins: Manage the Databricks account, including workspace creation, user management, cloud resources, and account usage monitoring.

Workspace admins: Manage workspace identities, access control, settings, and features for individual workspaces in the account.

Additionally, users can be assigned these feature-specific admin roles, which have narrower sets of privileges:

Marketplace admins: Manage their account’s Databricks Marketplace provider profile, including creating and managing Marketplace listings.

Metastore admins: Manage privileges and ownership for all securable objects within a Unity Catalog metastore, such as who can create catalogs or query a table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are account Admins?

A

Account admins have privileges over the entire Databricks account. As an account admin, you can
* create workspaces,
* configure cloud resources,
* Enable Unity Catalog (If your Databricks account was created after November 8, 2023, your workspaces might have Unity Catalog enabled by default. )
* view usage data,(Monitor account with system tables) and
* manage account identities, settings, and subscriptions.
* Manage Previews

Account admins can also delegate the account admin and workspace admin roles to any other user.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does Account Admin manage identities?

A

If you’ve enabled Unity Catalog for at least one workspace in your account, identities (users, groups, and service principals) should be managed in the account console. Account admins can grant permissions and assign workspaces to these identities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are different support options in databricks?

A

If you have any questions about setting up Databricks and need live help, please e-mail onboarding-help@databricks.com.

If you have a Databricks support package, you can open and manage support cases with Databricks.

If your organization does not have a Databricks support subscription, or if you are not an authorized contact for your company’s support subscription, you can get answers to many questions in Databricks Office Hours or from the Databricks Community.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to locate the account ID?

A

To retrieve your account ID, go to the account console and click the down arrow next to your username in the upper right corner. In the drop-down menu you can view and copy your Account ID.

You must be in the account console to retrieve the account ID, the ID will not display inside a workspace

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are different billing methods in Databricks?

A
  1. Pay-as-you-go accounts through AWS Marketplace
  2. Pay-as-you-go accounts paid by credit card to Databricks
  3. Contract accounts

Your account’s billing method is permanent and cannot be changed after you sign up for your account.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the default pricing plan for new accounts?

A

By default, new accounts are on the Premium plan, which adds audit logging, role-based access control, and other features that give you more control over security, governance, and more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Are access control settings enabled if you upgrade from standard plan to higher.

A

Access control settings are disabled by default on workspaces that are upgraded from the Standard plan to the Premium plan or above. Once an access control setting is enabled, it can not be disabled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the things to keep in mind after canceling a subscription.

A

After you cancel your subscription:

  1. You can no longer access workspaces, notebooks, or data in your Databricks account.
  2. In accordance with the Databricks terms of service, any Customer Content contained within workspaces tied to your subscription will be deleted within 30 days of cancellation.
  3. You can’t sign up for a new subscription using the same email address. You must provide a new email address in the sign-up form.

Once a subscription is cancelled, Databricks is not responsible for cleaning up the resources attached to the account. Databricks recommends terminating all compute resources before you cancel your subscription. Additionally, terminate any Databricks associated resources from your AWS console.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the steps to delete a databricks account?

A

Before you delete a Databricks account, you must first cancel your Databricks subscription and delete all Unity Catalog metastores in the account. After you delete all metastores associated with your organization’s account, you can start the process to delete your account.

If you need to delete your Databricks account, reach out to your account team for assistance or file a ticket at help.databricks.com.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Where can you import a usage dashboard?

A

Account admins can import cost management AI/BI dashboards to any Unity Catalog-enabled workspace in their account.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which underlying table the usage dashboard use ?

A

To use the imported dashboard, a user must have the SELECT permissions on the system.billing.usage and system.billing.list_prices tables. The dashboard’s data is subject to the usage table’s retention policies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How many max data lines are shown in Usage graph?

A

If there are more than 10 workspaces, SKUs, or tag values, the chart displays the nine with the highest usage. The usage of the remaining workspaces, SKUs, or tag values is aggregated and displayed in a single line, which is labeled as combined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the limitations of Budget?

A
  1. There could be up to a 24-hour delay between usage occurring and an email notification being sent.
  2. After you create a new budget, there could be a delay before you can see the budget details.
  3. Budgets do not factor in any billing credits or negotiated discounts your account might have. The spent amount is calculated by multiplying usage by the SKU list price.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the default level of network access for serverless compute ?

A

Serverless compute for notebooks and jobs has unrestricted access to the public internet by default.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Who can enable serverless compute in a account and what can they be used by?

A

After an account admin enables serverless compute, all eligible workspaces in the account will have access to use serverless compute for notebooks, jobs, and Delta Live Tables.

If your account was created after March 28, 2022, serverless compute is enabled by default for your workspaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the eligiblity for serverless compute enablement in a workspace ?

A

To be eligible for serverless compute for notebooks and jobs, your workspace must meet the following requirements:

  1. Must have Unity Catalog enabled.
  2. Must be in a supported region
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are No Isolation Shared Cluster ?

A

No Isolation Shared clusters run arbitrary code from multiple users in the same shared environment, similar to what happens on a cloud Virtual Machine that is shared across multiple users.

Data or internal credentials provisioned to that environment might be accessible to any code running within that environment. To call Databricks APIs for normal operations, access tokens are provisioned on behalf of users to these clusters.
When a higher-privileged user, such as a workspace administrator, runs commands on a cluster, their higher-privileged token is visible in the same environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Who can protect the admin credentials from being shared in No Isolation Shared Cluster ?

A

**Account admins **can prevent internal credentials from being automatically generated for Databricks workspace admins on No Isolation Shared clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are limitations of enabling admin protection in No Isolation shared cluster ?

A

The following Databricks features do not work if you enable admin protection for No Isolation Shared clusters on your account:

  1. Machine Learning Runtime workloads.
  2. Workspace files.
  3. dbutils Secrets utility.
  4. dbutils Notebook utility.
  5. Delta Lake operations by admins that create, modify, or update data.

Other features might not work for admin users on this cluster type because these features rely on automatically generated internal credentials.

In those cases, Databricks recommends that admins do one of the following:

  1. Use a different cluster type than “No isolation shared” or its equivalent legacy cluster types.
  2. Create a non-admin user when using No Isolation Shared clusters.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the prerequisite for enabling Automatic cluster update ?

A

Enabling this feature on a workspace requires that you add the Enhanced Security and Compliance add-on as described on the pricing page. This feature also requires the Enterprise pricing tier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the default time for automatic cluster update ?

A

By default, automatic cluster update is scheduled for** the first Sunday of every month at 1:00 AM UTC**. Account admins can use the Databricks account console to change the maintenance window frequency, start date, and start time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Which resources does the automatic cluster update applies to ?

A

This applies to all compute resources that run in the classic compute plane:
1. clusters,
2. pools,
3. classic SQL warehouses, and
4. legacy Model Serving.

It does not apply to serverless compute resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Will the machines will restart during maintainence window of Automatic cluster update ?

A

If there are no updates for images for running compute resources, by default they are not restarted but you can configure the feature to force restart during the maintenance window.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Who can enable automatic cluster update on a workspace
?

A

You must be an account admin to configure automatic cluster update. Although this setting is configured for each workspace, the controls for this feature are part of the account console UI, not the workspace admin console.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are serverless quotas ?

A

Serverless quotas are a safety measure for serverless compute. Quotas are enforced on serverless compute for notebooks, jobs, Delta Live Tables, and for serverless SQL warehouses.

Quotas are measured in Databricks Units (DBUs) per hour. A DBU is a normalized unit of processing power on the Databricks platform used for measurement and pricing purposes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are serverless quota for serverless compute for notebooks, jobs, and Delta Live Tables?

A

Serverless compute for notebooks, jobs, and Delta Live Tables includes a scale-up limit that imposes a maximum cost per workload per hour.
Because this limit is enforced per workload, it does not prevent you from launching new jobs or notebooks using serverless compute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are serverless quota for serverless SQL warehouses?

A

Quotas for serverless SQL warehouses restrict the number of serverless compute resources a customer can have at a time. This quota is enforced at the regional level for all workspaces in your account.

When you reach your serverless quota in a region, workspaces cannot launch new serverless SQL warehouses. Reaching this limit does not terminate existing SQL warehouses in the region. However, if you reach the limit, Databricks prevents increasing the number of compute resources in the warehouse.

There are initial default quotas for accounts, but Databricks automatically proactively increases quotas based on your type of account and how you use Databricks.** Databricks intends to prevent typical customers from reaching quota limits during normal usage**.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the two tiers avaialble in databricks for subcription ?

A

Premium and Enterprise tier. The standard tier is now deprecated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a workspace in databricks ?

A

A workspace is a Databricks deployment in a cloud service account. It provides a unified environment for working with Databricks assets for a specified set of users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

After how much time the usage information gets updated?

A

If you subscribed to Databricks through AWS Marketplace, Databricks usage takes 24 hours to appear in the AWS Billing & Cloud Management dashboard. Usage appears in the Databricks account console after one hour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is the relation between workspace deployment name and workspace url ?

A

The deployment name defines part of the subdomain for the workspace. The workspace URL for access to the Databricks web application and REST APIs is {workspace-deployment-name}.cloud.databricks.com.

This value must be unique across all workspaces across all AWS regions, not including deleted workspaces.

Some Databricks accounts have a deployment name prefix that interacts with this feature. Contact your Databricks account team to set a deployment name prefix for your account. If your account has a deployment name prefix, the final workspace deployment name includes the account prefix followed by a hyphen. For example, if your account’s deployment prefix is acme and you enter the deployment name as workspace-1, the new workspace’s deployment name becomes acme-workspace-1. The final workspace URL is acme-workspace-1.cloud.databricks.com.

37
Q

What are the configuration options for a workspace administrator on SQL Warehouse ?

A

Workspace administrators can configure the following permissions for a Databricks workspace:

  1. Revoke all access to SQL warehouses.
  2. Grant the ability to create SQL warehouses.
  3. Configure default parameters that control the SQL warehouse compute environment.
  4. Configure data access policies for SQL warehouses.

By default, all users have access to Databricks SQL. To onboard users to Databricks SQL, you should deploy a SQL warehouse, grant users access to the SQL warehouse, and grant access to data using Unity Catalog.

38
Q

What will happen if I change the parameters of SQL Warehouse in workspace?

A

When you change a SQL configuration parameter, all running SQL warehouses are automatically restarted.

39
Q

What are the limitations on transfering the ownership of a sql warehouse ?

A
  1. The user you transfer ownership of a SQL warehouse to must have the Allow unrestricted cluster creation entitlement.
  2. Service principals and groups cannot be assigned ownership of a SQL warehouse.
40
Q

What are different types of SQL warehouse databricks support ?

A

Databricks SQL supports the following SQL warehouse types:

  • Serverless
  • Pro
  • Classic
41
Q

What are the difference between the sql warehouses in databricks ?

A

Each SQL warehouse type has different performance capabilities.

Serverless
1. Photon Engine

  1. Predictive IO
  2. Intelligent Workload Management

Pro

  1. Photon Engine
  2. Predictive IO

Classic

  1. Photon Engine

The following list describes each performance feature:

Photon: The native vectorized query engine on Databricks. It makes your existing SQL and DataFrame API calls faster and reduces your total cost per workload.

Predictive IO: A suite of features for speeding up selective scan operations in SQL queries. Predictive IO can provide a wide range of speedups.

Intelligent workload management (IWM): A set of features that enhances Databricks SQL Serverless’s ability to process large numbers of queries quickly and cost-effectively. Using AI-powered prediction and dynamic management techniques, IWM works to ensure that workloads have the right amount of resources quickly. The key difference lies in the AI capabilities in Databricks SQL to respond dynamically to workload demands rather than using static thresholds.

42
Q

What are benefits of Serverless SQL Warehouse ?

A
  1. Rapid startup time (typically between 2 and 6 seconds).
  2. Rapid upscaling to acquire more compute when needed for maintaining low latency.
  3. Query admittance closer to the hardware’s limitation rather than the virtual machine.
  4. Quick downscaling to minimize costs when demand is low, providing consistent performance with optimized costs and resources.
43
Q

What are benefits of Pro SQL Warehouse ?

A
  1. A pro SQL warehouse supports Photon and Predictive IO, but does not support Intelligent Workload Management.
  2. With a pro SQL warehouse (unlike a serverless SQL warehouse), the compute layer exists in your your AWS account account rather than in your Databricks account. As a result, a pro SQL warehouse does not support Intelligent Workload Management, making it less responsive to query demand that varies greatly over time and unable to autoscale as rapidly as a serverless SQL warehouse.
  3. A pro SQL warehouse takes several minutes to start up (typically approximately 4 minutes) and scales up and down with less responsiveness than a serverless SQL warehouse.
44
Q

What are benefits of Classic SQL Warehouse ?

A
  1. A classic SQL warehouse supports Photon, but does not support Predictive IO or Intelligent Workload Management.
  2. With a classic SQL warehouse (unlike a serverless SQL warehouse), the compute layer exists in your AWS account account rather than in your Databricks account.
  3. Without support for Predictive IO or Intelligent Workload Management, a classic SQL warehouse provides only entry level performance and less performance than either a serverless or a pro SQL warehouse.
  4. A classic SQL warehouse also takes several minutes to start up (typically approximately 4 minutes) and scales up and down with less responsiveness than a serverless SQL warehouse.
45
Q

What are default warehouse types in databricks ?

A

In region where serverless is supported
via UI = > Serverless
via API => Classic
in legacy hive metastore workspace => same as Unsupported

In region where serverless is Unsupported
via UI = > Pro
via API => Classic

46
Q

What are the requirement for enabling Serverless SQL warehouse in a Workspace ?

A
  1. Your Databricks account must not be on a free trial.
  2. Your Databricks workspace must be on the Premium plan or above.
  3. Your workspace control plane and serverless compute plane must be in a region that supports Databricks SQL Serverless.
  4. Your workspace must not use S3 access policies.
  5. Your workspace must not use an external Hive legacy metastore. See Remove Hive metastore credentials to enable serverless.
47
Q

Which metastore does serverless warehouse support ?

A

Serverless SQL warehouses support the default Databricks metastore and AWS Glue as a metastore, but do not support external Hive metastores.

48
Q

Who can manage Previews in databricks?

A

This experience is only available to workspace and account admins. Account admins can manage Databricks account-level Previews from the Account console.

49
Q

What is the default access modes for Jobs compute in a workspace ?

A

This can be set via Workspace settings. However If you choose not to select a default access mode for your workspace, jobs compute with undefined access modes will default to No isolation shared, which allows multiple users to share the compute resource with no isolation between users.

50
Q

How can web terminal be enabled in workspace ?

A

To use the web terminal on your clusters, an account admin can configure whether the web terminal feature should be** on or off **and whether this value should be enforced on all workspaces.

If the account administrator chooses not to enforce the web terminal feature, workspace administrators can choose to enable or disable the web terminal on a per-workspace.

If the account administrator has enforced the web terminal setting on all workspaces, changes in the workspace admin settings page will have no effect.

51
Q

How to permanently delete workspace objects ?

A

You can delete workspace objects such as entire notebooks, individual notebook cells, individual notebook comments, and experiments, but they are recoverable.

To permanently purge deleted workspace objects go to workspace settings.

52
Q

Where does the notebook’s result are stored in databricks ?

A

Notebook command output is** stored differently depending on how you run the notebook**.

By default, when you run a notebook interactively by clicking Run in the notebook:

If the results are small, they are stored in the Databricks control plane, along with the notebook’s command contents and metadata.

Larger results are stored in the workspace storage bucket in your AWS account. Databricks uses this bucket for workspace system data and your workspace’s DBFS root. Notebook results are stored in the workspace system data part of the bucket, which is not accessible by users.

When you run a notebook as a job, by scheduling it or by clicking Run now on the Jobs page, all results are stored in the workspace storage bucket in your account.

53
Q

Can we enfore to store all Notebooks result in workspace storage bucket ?

A

Yes, in the Workspace settings page.

54
Q

How can account admin restrict workspace admins ?

A

By default, workspace admins can change a job owner** to any user or service principal in their workspace**. Workspace admins can change the job run as setting to service principals they have the Service Principal User role on or to any user in their workspace.

Workspace admins can also create a personal access token on behalf of any service principal in their workspace by default.

Account admins can configure a workspace setting called RestrictWorkspaceAdmins to restrict workspace admins to only change a job owner to themselves and the job run as setting to a service principal that they have the Service Principal User role on.

The setting also restricts workspace admins to only create a personal access token for service principals that they have the Service Principal User role on.

55
Q

what are databricks Identities ?

A

There are three types of Databricks identity:

Users: User identities recognized by Databricks and represented by email addresses.

Service principals: Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms.

Groups: A collection of identities used by admins to manage group access to workspaces, data, and other securable objects. All Databricks identities can be assigned as members of groups.
There are two types of groups in Databricks: account groups and workspace-local groups.

56
Q

Who can manage databrick Identities ?

A

To manage identities in Databricks, you must have one of the following:
1. the account admin role,
2. the workspace admin role, or
3. the manager role on a service principal or group.

57
Q

How can Account Admin manage Identities in the Account?

A

Account admins can add users, service principals, and groups to the account and assign them admin roles. Account admins can update and delete users, service principals, and groups in the account.
They can give users access to workspaces, as long as those workspaces use identity federation.

58
Q

How can Workspace Admin manage Identities in the Account?

A

Workspace admins can add users and service principals to the Databricks account. They can also add groups to the Databricks account if their workspaces are enabled for identity federation.
Workspace admins can grant users, service principals, and groups access to their workspaces.
They cannot delete users and service principals from the account.

Workspace admins can also manage workspace-local groups

59
Q

How can Group Managers manage Identities in the Account?

A

Group Managers can manage group membership and delete the group.
They can also assign other users the group manager role.
Account admins have the group manager role on all groups in the account.
Workspace admins have the group manager role on account groups that they create.

60
Q

How can Service Principal Managers manage Identities in the Account?

A

Service principal managers can manage roles on a service principal.
Account admins have the service principal manager role on all service principals in the account.
Workspace admins have the service principal manager role on service principals that they create.

61
Q

What is the default access level of a databricks user ?

A

Users in a Databricks account do not have any default access to a workspace, data, or compute resources.

Account admins and workspace admins can assign account users to workspaces.

Workspace admins can also add a new user directly to a workspace, which both automatically adds the user to the account and assigns them to that workspace.

62
Q

What are view only users in databricks ?

A

Users can share published dashboards with other users in the Databricks account, even if those users are not members of their workspace.
Users in the Databricks account who are not members of any workspace are the equivalent of view-only users in other tools. They can view objects that have been shared with them, but they cannot modify objects.

63
Q

What are workspace local groups ?

A

Workspace admins can also create legacy workspace-local groups in workspaces using the Workspace Groups API. Workspace-local groups are not automatically added to the account. Workspace-local groups cannot be assigned to additional workspaces, or granted access to data in a Unity Catalog metastore.

64
Q

How are identities managed in a non identity federated Workspace ?

A

For those workspaces that aren’t enabled for identity federation, workspace admins manage their workspace users, service principals, and groups entirely within the scope of the workspace.
Users and service principals added to non-identity federated workspaces are automatically added to the account.

Groups added to non-identity federated workspaces are legacy workspace-local groups that are not added to the account.

65
Q

How to enable identity federation on a workspace ?

A

If your workspace is enabled for identity federation by default, it cannot be disabled.

To enable identity federation in a workspace, an account admin needs to enable the workspace for Unity Catalog by assigning a Unity Catalog metastore.

When the assignment is complete, identity federation is marked as Enabled on the workspace’s Configuration tab in the account console.

66
Q

How are Admin Roles assigned to the users ?

A

Account admins can assign other users as account admins. They can also become Unity Catalog metastore admins by virtue of creating a metastore, and they can transfer the metastore admin role to another user or group.

Both account admins and workspace admins can assign other users as** workspace admins**. The workspace admin role is determined by membership in the workspace admins group, which is a default group in Databricks and cannot be deleted.

Workspace admins are members of the admins group in the workspace, which is a reserved group that cannot be deleted.

Account admins can also assign other users as Marketplace admins.

67
Q

What are deactivated Users ?

A

Account admins can deactivate users across a Databricks account. A deactivated user cannot login to the Databricks account or workspaces. However, all of the user’s permissions and workspace objects remain unchanged. When a user is deactivated the following is true:

  1. The user cannot login to the account or any of their workspaces from any method.
  2. Applications or scripts that use the tokens generated by the user can no longer access the Databricks API. The tokens remain but cannot be used to authenticate while a user is deactivated.
  3. Notebooks owned by the user remain.
  4. Clusters owned by the user remain running.
  5. Scheduled jobs created by the user have to be assigned to a new owner to prevent them from failing.

When a user is reactivated, they can login to Databricks with the same permissions. Databricks recommends deactivating users from the account instead of removing them because removing a user is a destructive action. A deactivated user’s status is labeled Inactive in the account console. You can also deactivate a user from a specific workspace You cannot deactivate a user using the workspace admin settings page. Instead, use the Workspace Users API.

You cannot deactivate a user using the account console. Instead, use the Account Users API.

68
Q

What are consequenses of deleting a user ?

A

When you remove a user from the account, that user is also removed from their workspaces, regardless of whether or not identity federation has been enabled. We recommend that you refrain from deleting account-level users unless you want them to lose access to all workspaces in the account. Be aware of the following consequences of deleting users:

  1. Applications or scripts that use the tokens generated by the user can no longer access Databricks APIs
  2. Jobs owned by the user fail
  3. Clusters owned by the user stop
  4. Queries or dashboards created by the user and shared using the Run as Owner credential have to be assigned to a new owner to prevent sharing from failing
69
Q

What are service Principals ?

A

A service principal is an identity that you create in Databricks for use with automated tools, jobs, and applications. Service principals give automated tools and scripts** API-only access** to Databricks resources, providing greater security than using users or groups.

You can grant and restrict a service principal’s access to resources in the same way as you can a Databricks user.

You can also grant Databricks users, service principals, and groups permissions to use a service principal. This allows users to run jobs as the service principal, instead of as their identity. This prevents jobs from failing if a user leaves your organization or a group is modified.

Unlike a Databricks user, a service principal is an API-only identity; it cannot be used to access the Databricks UI.

70
Q

What are service principal roles ?

A

Service principal roles are account-level roles. This means that they only need to be defined once, in your account, and apply across all workspaces. There are two roles that you can grant on a service principal: Service Principal Manager and Service Principal User.

Service Principal Manager allows you to manage roles on a service principal. The creator of a service principal has the Service Principal Manager role on the service principal. Account admins have the Service Principal Manager role on all service principals in an account.

Service Principal User allows workspace users to run jobs as the service principal. The job will run with the identity of the service principal, instead of the identity of the job owner. The Service Principal User role also allows workspace admins to create tokens on behalf of the service principal.

Users with the Service Principal Manager role do not inherit the Service Principal User role. If you want to use the service principal to execute jobs, you need to explicitly assign yourself the service principal user role, even after creating the service principal.

71
Q

What difference between account groups and workspace local groups ?

A

Account groups can be granted access to data in a Unity Catalog metastore, granted roles on service principals and groups, and permissions to identity federated workspaces.

Workspace-local groups are legacy groups. These groups are identified as workspace-local in the workspace admin settings page. Workspace-local groups cannot be assigned to additional workspaces or granted access to data in a Unity Catalog metastore. Workspace-local groups cannot be granted account-level roles.

72
Q

What are two default groups in a workspace ?

A

There are two system groups in each workspace: users and admins. All workspace users are members of the users group and all workspace admins are members of the admins group. System groups are workspace-local groups. System groups cannot be deleted.

73
Q

What are different ways Group managers can manage groups ?

A

Account admins can manage group roles using the account console, and workspace admins can manage group roles using the workspace admin settings page.
**Group managers that are not workspace admins **can manage group roles using the Accounts Access Control API

Account admins have the group manager role on the account-level, which means they have the group manager role on all groups in the account.
Workspace admins have the group manager role on account groups that they create.

74
Q

What are effects of removing a group from workspace ?

A

Removing a group from a workspace does not delete the group in the account. When a group is removed from a workspace, group members can no longer access the workspace, however permissions are maintained on the group. If the group is later added back to the workspace, the group regains its previous permissions.

75
Q

What are effects of removing a group from account ?

A

When you remove a group, all users in that group are deleted from the account and lose access to any workspaces they had access to (unless they are members of another group or have been directly granted access to the account or any workspaces). Databricks recommends that you refrain from deleting account-level groups unless you want them to lose access to all workspaces in the account. Be aware of the following consequences of deleting users:

  1. Applications or scripts that use the tokens generated by the user can no longer access Databricks APIs
  2. Jobs owned by the user fail
  3. Clusters owned by the user stop
  4. Queries or dashboards created by the user and shared using the Run as Owner credential have to be assigned to a new owner to prevent sharing from failing
76
Q

How does the Workspace admin manages the Workspace local group ?

A

Workspace admins can add and manage workspace-local groups using the workspace-level SCIM API.** In identity federated workspaces, workspace-local groups can only be managed** using the API.

Workspace admins can add and manage workspace-local groups using the workspace admin settings page in non-identity federated workspaces.

77
Q

How to convert workspace local group to account group ?

A
  1. rename local group by appending workspace.
  2. create replica of local group in account group with orignal name.
  3. delete local group
78
Q

What are compute policies ?

A

A policy is a tool workspace admins can use to limit a user or group’s compute creation permissions based on a set of policy rules.

Policies provide the following benefits:

  1. Limit users to creating clusters with prescribed settings.
  2. Limit users to creating a certain number of clusters.
  3. Simplify the user interface and enable more users to create their own clusters (by fixing and hiding some values).
  4. Control cost by limiting per cluster maximum cost (by setting limits on attributes whose values contribute to hourly price).
  5. Enforce cluster-scoped library installations.

Policies require the Premium plan or above.

79
Q

How can you add libraries to policies ?

A

You can add libraries to a policy so libraries are automatically installed on compute resources. You can add a maximum of 500 libraries to a policy.

You may have previously added compute-scoped libraries using init scripts. Databricks recommends using compute policies instead of init scripts to install libraries.

80
Q

What is personal compute policy ?

A

Personal Compute is a** Databricks-managed policy available,** by default, on all Databricks workspaces. Granting users access to this policy enables them to create single-machine compute resources in Databricks for their individual use. Users can create the personal compute resource quickly using shortcuts in either a notebook or the Compute page.

By default, all users in a Databricks account have access to the Personal Compute default policy.

81
Q

Who can grant personal compute policy ?

A

Workspace admins can manage access to the Personal Compute policy on individual workspaces using the policies UI.

Account admins can enable or disable access to the Personal Compute policy for all users in an account using the Personal Compute account setting or Or, switch the setting to Delegate if you want to policy to be managed at the workspace level.

82
Q

What are the default policies avaialble in workspace ?

A

In your workspace, four default policies are designed for four different use cases. The policies are the following:

Personal Compute : single node ,single user, all purpose no jobs

Shared Compute: multi node ,multi user, all purpose no jobs

Power User Compute: multi node ,single user, all purpose no jobs

Job Compute: jobs compute

By default, workspace admins have access to all four policies. ** Non-admins only have access to the Personal Compute policy but can be granted access to any default policy** .

83
Q

What are system tables ?

A

System tables are a** Databricks-hosted analytical store of your account’s operational data** found in the system catalog. System tables can be used for historical observability across your account.

84
Q

What are requirement for system tables ?

A

To access system tables, your workspace must be enabled for Unity Catalog.

Since system tables are governed by Unity Catalog, you need to have at least one Unity Catalog-enabled workspace in your account to enable and access system tables.

System tables include data from all workspaces in your account but they can only be accessed from a Unity Catalog-enabled workspace.

System tables are enabled at the schema level. If you enable a system schema, you enable all the tables within that schema. When new schemas are released, an account admin needs to manually enable the schema.

System tables must be enabled by an account admin. You can enable system tables using the SystemSchemas API.

85
Q

How is access to system tables are granted ?

A

Access to system tables is governed by Unity Catalog. No user has access to these system schemas by default. To grant access, a user that is both a metastore admin and an account admin must grant USE and SELECT permissions on the system schemas

86
Q

Does system contain data from all workspaces in a account ?

A

System tables contain operational data for all workspaces in your account deployed within the same cloud region. Billing system tables contain account-wide data.

Even though system tables can only be accessed through a Unity Catalog workspace, the tables also include operational data for the non-Unity Catalog workspaces in your account.

87
Q

Where is system table data stored ?

A

Your account’s system table data is stored in a Databricks-hosted storage account located in the same region as your metastore. The data is securely shared with you using Delta Sharing.

Each table has a free data retention period.

88
Q

Where are system tables located in Catalog Explorer?

A

The system tables in your account are located in a catalog called system, which is included in every Unity Catalog metastore. In the system catalog you’ll see schemas such as access and billing that contain the system tables.