2025 Clustering and Micro-Partitions Flashcards

1
Q

How is Snowflake data stored?

A

Data is structured in a columnar fashion as encrypted, compressed files called micro-partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the uncompressed size of data stored in each partition

A

50 MB to 500 MB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Are the micro-partitions immutable?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is query pruning?

A

The metadata contains information to directly identify the micro-partition that contains data corresponding to the user query, instead of scanning the entire dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What guidelines are followed in query pruning

A

Prune micro-partitions not needed
Prune columns not needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does Clustering mean

A

Dividing datasets into small groups based on data similarity. Used by Snowflake for efficient data pruning, resulting in optimized query performance. It involves organizing data based on the contents of one or more columns, called clustering keys.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Clustering is recommended for what size tables

A

Tables more than 1 TB, less than this there is good chance that clustering cost might surpass its benefits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you check if a table might benefit from a clustering approach

A

Clone the table and apply the clustering approach on the cloned table and see if the query performance improves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Besides size, what makes a table a good candidate for clustering

A

Tables that do not change frequently and are queried regurlarly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is reclustering a table maintained by Snowflake

A

Yes, and it will consume credits and have associated storage costs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can you set a cluster keys on a table

A

With the CREATE or ALTER statements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Is there a default clustering key in Snowflake

A

No, if none is defined, clusters are created during data inserts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To manage costs, Snowflake recommends what percentage of the columns/expressions to be used as clustering keys

A

3/4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What columns are recommended for clustering

A

Columns frequently used in selective filters
Columns frequently used in joining predicates
Number of distinct values, large enough distinct values for effective. query pruning and small enough for co-locating data in the same micro-partitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In the case of multicolumn clustering, how does Snowflake recommend ordering the columns

A

From lowest to highest cardinality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you calculate the average depth of a table according to the clustering keys

A

SYSTEM$CLUSTERING_DEPTH(‘<t1>', '(<c1>,<c2>,..)'[,'<p>'])</c2></c1></t1>

17
Q

What is the clustering depth for a table with no micro-partitions

18
Q

What does SYSTEM$CLUSTERING_INFORMATION do

A

Provides useful metrics like overlapping micro-partitions and partition depth, but it does not independently determine clustering efficiency. Information is returned as a JSON object

19
Q

Using clustering information, what is a good indicator the table is not well clustered

A

High value of Average_overlaps or Average_depth

20
Q

What does it mean one there is one micro partition and the average depth is 1

A

The whole table will always be read for a any query

21
Q

In clustering information, when looking at constant micro-partitions, what is good and what is bad

A

The higher the number of constant micro-partitions is, the more micro-partitions can be pruned from queries executed on the table