Performance Management Flashcards
Clustering in Snowflake
As data is loaded into Snowflake, it is micro-partitioned and these partitions are clustered based on its metadata (by similarity). This is done automatically and requires no user intervention/management.
Can Materialized Views be clustered?
Yes
Cluster Key
To manually improve clustering, Snowflake supports the addition of a Cluster Key to a table. A clustering key is deigned to co-locate data in a table in the same micro-partition. This improves overall clustering and query performance.
(Generally Snowflake does a good job doing this, but at times it can be better to manually intervene)
How to determine if a table is clustered?
A table is considered to be clustered if a clustering key is present.
Should materialized views be clustered?
x
How does clustering work in the background?
As data is inserted into a table, Clustering metadata is collected and recorded for each micro-partition created during the process. Snowflake leverages this clustering information to avoid unnecessary scanning of micro-partitions, and also uses this info for clustering key purposes.
When can clustering keys be defined?
CREATE TABLE
ALTER TABLE
What should determine whether to define a clustering key?
- Queries on the table are running slower than expected/degraded over time.
- The clustering depth for a table is large.
Impact of Clustering on Query Pruning
Micro-partition metadata maintained by Snowflake enables precise and optimal pruning of columns in micro-partitions, including columns with semi-structured data.
Clustering Query Pruning Example
If a query filters on value X, and value X is only present in 10% of all micro-partitions, a well clustered table will only scan 10% of its micro-partitions and ignore the rest of the micro-partitions.
What keys are best clustered? (in order of priority)
- Cluster columns that are most actively used by select filters
- Columns frequently joined on
Recommended Maximum Columns per Cluster Key
4
What qualities make for a good cluster key
A column with low cardinality is a bad candidate, and a column with high cardinality is a bad candidate as well.
A column in the middle in terms of cardinality is best.
What is a View?
A view is a table-like object in Snowflake derived from a query. A view functions effectively like a table and can be used interchangeably with a table.
What is a Materialized View?
Identical to a normal view except that the predefined query is pre-computed and the output is stored for later use when the view is called.
Are materialized views faster than regular queries or views?
Yes, because the compute on a materialized view is done ahead of time unlike the compute for a regular query and view.
Search Optimization
Search Optimization is a service within Snowflake that aims to improve performance of selective lookup queries on tables. A point lookup query returns only one or a small number of distinct rows.
When is Search Optimization useful?
- Business users need fast response times for critical dashboards with highly selective features
- Data scientists exploring large volumes of data and are looking for specific subsets of data
Data types supported by Search Optimization
- Fixed-point numbers (INT/NUMERIC)
- DATE, TIME, and TIMESTAMP
- VARCHAR
- BINARY
Search Optimization works best to improve the performance of a query when the following is true
For tables:
- The table is not clustered
- The table is frequently queried on columns other than the primary cluster key
For queries:
- The query typically runs for 10+ seconds
- At least one column accessed through the filter has 100k+ distinct values
Alternative Solutions to Search Optimization
- Clustering
- Materialized Views
- Caching
Clustering a table can speed up which of the following? (Assuming they are on the clustering key)
- Range searches
- Equality searches
Granting access to Search Optimization
You must have OWNERSHIP on the table you want to optimize.
You must have the ADD SEARCH OPTIMIZATION privilege on the schema that contains the table.
What factors impact query performance?
- Query Queueing
- Disk Spillage
- Control Join Order
- Unsatisfactory Pruning
- Recognizing Row Explosion