Databricks Pro Flashcards

1
Q

What does transaction log do?

A

Records changes to records. Not metadata. Inserts, updates, deletes. Table versioning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What and where is metastore?

A

high-level metadata about table. Name, schema, location. Info in unity catelog.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Where would schema changes go?

A

Schema changes would be in the metastore and the transaction log.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Pip and Pypi?

A

pip is an installation method and pypi is an index that is searched for python packages to install.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of a wheel?

A

It’s a way to package code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the keywords for a shallow clone?

A

CREATE TABLE LIKE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the keywords for a deep clone?

A

CREATE TABLE AS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which statement regarding static Delta tables in Stream-Static joins is correct?

A

The latest version of the static Delta table is returned each time it is queried by a microbatch of the stream-static join

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What statistics are captured in the transaction log?

A

Total number of records

Minimum value in each column of the first 32 columns of the table

Maximum value in each column of the first 32 columns of the table

Null value counts for in each column of the first 32 columns of the table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Databricks secret scope permissions

A

The secret access permissions are as follows:

MANAGE - Allowed to change ACLs, and read and write to this secret scope.

WRITE - Allowed to read and write to this secret scope.

READ - Allowed to read this secret scope and list what secrets are available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

do

What does sys.path do?

A

The sys.path variable contains a list of directories where the Python interpreter searches for modules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is DBFS

A

Abstraction on top of scalable object storage like ADLS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Can attach permission

A

Attach to cluster, view UI, and view metrics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Attach to cluster, view UI, and view metrics, terminate as well as start and restart compute

A

can restart permission

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

cluster permission that can do everything. For example, editing.

A

can manage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MLFlow model

A

Mlflow.pyfunc.spark_udf function allows to register a MLFlow model as a Apache Spark UDF.

The registration allows you to use the trained model within Spark, enabling you to apply the model to new data in a distributed manner.

17
Q

OPTIMIZE file size

A

1 GB

18
Q

Auto Optimize file size

A

128 MB

19
Q

Does auto-compact support z-ordering?

A

No

20
Q

What’s the minimum size for a partition?

A

1 GB

21
Q

Spark UI

A

So a compute in databricks is reading and writing data. I think Input Size means reading data. So does Min: 78 KiB / 16 records mean that the minimum amount of data read was 16 records, which amounted to 78 KiB in space?