2025 Clustering and Micro-Partitions Flashcards

Question 1

Q

How is Snowflake data stored?

Answer

A

Data is structured in a columnar fashion as encrypted, compressed files called micro-partitions.

Question 2

Q

What is the uncompressed size of data stored in each partition

Answer

A

50 MB to 500 MB

Question 3

Q

Are the micro-partitions immutable?

Question 4

Q

What is query pruning?

Answer

A

The metadata contains information to directly identify the micro-partition that contains data corresponding to the user query, instead of scanning the entire dataset.

Question 5

Q

What guidelines are followed in query pruning

Answer

A

Prune micro-partitions not needed
Prune columns not needed

Question 6

Q

What does Clustering mean

Answer

A

Dividing datasets into small groups based on data similarity. Used by Snowflake for efficient data pruning, resulting in optimized query performance. It involves organizing data based on the contents of one or more columns, called clustering keys.

Question 7

Q

Clustering is recommended for what size tables

Answer

A

Tables more than 1 TB, less than this there is good chance that clustering cost might surpass its benefits

Question 8

Q

How can you check if a table might benefit from a clustering approach

Answer

A

Clone the table and apply the clustering approach on the cloned table and see if the query performance improves.

Question 9

Q

Besides size, what makes a table a good candidate for clustering

Answer

A

Tables that do not change frequently and are queried regurlarly

Question 10

Q

Is reclustering a table maintained by Snowflake

Answer

A

Yes, and it will consume credits and have associated storage costs

Question 11

Q

How can you set a cluster keys on a table

Answer

A

With the CREATE or ALTER statements

Question 12

Q

Is there a default clustering key in Snowflake

Answer

A

No, if none is defined, clusters are created during data inserts

Question 13

Q

To manage costs, Snowflake recommends what percentage of the columns/expressions to be used as clustering keys

Question 14

Q

What columns are recommended for clustering

Answer

A

Columns frequently used in selective filters
Columns frequently used in joining predicates
Number of distinct values, large enough distinct values for effective. query pruning and small enough for co-locating data in the same micro-partitions

Question 15

Q

In the case of multicolumn clustering, how does Snowflake recommend ordering the columns

Answer

A

From lowest to highest cardinality.

Question 16

Q

How do you calculate the average depth of a table according to the clustering keys

Answer

A

SYSTEM$CLUSTERING_DEPTH(‘<t1>', '(<c1>,<c2>,..)'[,'<p>'])</c2></c1></t1>

Question 17

Q

What is the clustering depth for a table with no micro-partitions

Question 18

Q

What does SYSTEM$CLUSTERING_INFORMATION do

Answer

A

Provides useful metrics like overlapping micro-partitions and partition depth, but it does not independently determine clustering efficiency. Information is returned as a JSON object

Question 19

Q

Using clustering information, what is a good indicator the table is not well clustered

Answer

A

High value of Average_overlaps or Average_depth

Question 20

Q

What does it mean one there is one micro partition and the average depth is 1

Answer

A

The whole table will always be read for a any query

Question 21

Q

In clustering information, when looking at constant micro-partitions, what is good and what is bad

Answer

A

The higher the number of constant micro-partitions is, the more micro-partitions can be pruned from queries executed on the table