Performance Management Flashcards
Clustering in Snowflake
As data is loaded into Snowflake, it is micro-partitioned and these partitions are clustered based on its metadata (by similarity). This is done automatically and requires no user intervention/management.
Can Materialized Views be clustered?
Yes
Cluster Key
To manually improve clustering, Snowflake supports the addition of a Cluster Key to a table. A clustering key is deigned to co-locate data in a table in the same micro-partition. This improves overall clustering and query performance.
(Generally Snowflake does a good job doing this, but at times it can be better to manually intervene)
How to determine if a table is clustered?
A table is considered to be clustered if a clustering key is present.
Should materialized views be clustered?
x
How does clustering work in the background?
As data is inserted into a table, Clustering metadata is collected and recorded for each micro-partition created during the process. Snowflake leverages this clustering information to avoid unnecessary scanning of micro-partitions, and also uses this info for clustering key purposes.
When can clustering keys be defined?
CREATE TABLE
ALTER TABLE
What should determine whether to define a clustering key?
- Queries on the table are running slower than expected/degraded over time.
- The clustering depth for a table is large.
Impact of Clustering on Query Pruning
Micro-partition metadata maintained by Snowflake enables precise and optimal pruning of columns in micro-partitions, including columns with semi-structured data.
Clustering Query Pruning Example
If a query filters on value X, and value X is only present in 10% of all micro-partitions, a well clustered table will only scan 10% of its micro-partitions and ignore the rest of the micro-partitions.
What keys are best clustered? (in order of priority)
- Cluster columns that are most actively used by select filters
- Columns frequently joined on
Recommended Maximum Columns per Cluster Key
4
What qualities make for a good cluster key
A column with low cardinality is a bad candidate, and a column with high cardinality is a bad candidate as well.
A column in the middle in terms of cardinality is best.
What is a View?
A view is a table-like object in Snowflake derived from a query. A view functions effectively like a table and can be used interchangeably with a table.
What is a Materialized View?
Identical to a normal view except that the predefined query is pre-computed and the output is stored for later use when the view is called.
Are materialized views faster than regular queries or views?
Yes, because the compute on a materialized view is done ahead of time unlike the compute for a regular query and view.