Performance Concepts: Query Optimization Flashcards

1
Q

History Tab

A
  • Displays query history for the last 14 days
  • Users can view other users queries but cannot view their query results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Analysis of Query Duration: Compilation Time

A

Query engine performing operations like cost based optimization and micropartition pruning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Analysis of Query Duration: Execution Time

A

Steps to process the data, processing time on the CPU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Query Analysis: Profile Overview: Processing

A

The percentage of the total execution time spent on data processing by the CPU of the virtual warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Query Analysis: Profile Overview: Local Disk I/O

A

-The percentage of time processing was blocked by local disk access
- In other words, when the virtual warehouse had to read or write data from the local SSD storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Query Analysis: Profile Overview: Remote Disk I/O

A

When the virtual warehouse had to read or write data from the remote storage layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Database Order of Execution

A

Rows->Groups->Result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Rows statements

A

FROM
JOIN
WHERE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Group statements

A

GROUP BY
HAVING

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Result statements

A

SELECT
DISTINCT
ORDER BY
LIMIT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bytes spilled to local storage

A

Volume of data spilled to virtual warehouse local disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bytes spilled to remote storage

A

Volume of data spilled to remote disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Caching Layers

A

Services Layer
Warehouse Layer
Storage Layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Services Layer Cache Categories

A

Metadata Cache
Results Cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Metadata Cache

A
  • highly available service residing in the cloud services layer
  • maintains metadata on object information and statistics for each cache type
  • sometimes referred to as the metadata store cloud services layer or just services layer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the metadata cache store?

A

-maintains understanding of what tables create, what databases those are in, what are the micro partitions that make up those tables, clustering information and more
- enables queries to be executed without the need of a virtual warehouse
- For example, the row count of each table is stored in a cache allowing for executing something such as a select count without using the virtual warehouse
- Functions, context functions, describe commands, and show commands make use of metadata

17
Q

Results Cache

A
  • Also referred to as Result 24 hour
  • Each time a persistent result is reused, the 24 hour retention period is extended up to a maximum of 31, after which the result is purged
18
Q

To reuse a result, ‘Result Cache’ Rule

A
  • New query exactly matches previous query
  • The underlying table data has not changed
  • The same role is used as the previous query
  • If time context functions are used, such as CURRENT_TIME(), the result cache will not be used
  • Result reuse is disabled using the session parameter USE_CACHED_RESULT
19
Q

Warehouse Cache

A
  • Virtual warehouses have local SSD storage which maintains raw table data used for processing a query
  • The larger the virtual warehouse the greater the local cache
  • It is purged when the virtual warehouse is resized, suspended, or dropped
  • Can be used partially, retrieving the rest of the data required for a query from remote storage
20
Q

Materialized View

A
  • a pre-computed and persisted data set derived from a SELECT query
  • are updated via a background process ensuring data is current and consistent with the base table
  • improve query performance by making complex queries that are commonly executed, readily available
  • an enterprise edition and above serverless feature
  • use compute resources to perform automatic background maintenance
  • use storage to store query results, adding to the monthly storage usage for an account
21
Q

Materialized Views can be created on top of ____________ _____________ to improve their query performance.

A

external tables

22
Q

Materialized Views are limited in the following ways:

A
  • Can only query a single table
  • Cannot use JOIN
  • Cannot use UDF, HAVING, ORDER BY, LIMIT, WINDOW FUNCTIONS
23
Q

Clustering

A

a way describing the distribution of a column’s values

24
Q

Automatic Clustering

A
  • Snowflake supports specifying one or more table columns/expressions as a clustering key for a table
  • Clustering aims to co-locate data of the clustering key in the same micro-partitions
  • Clustering improves performance of queries that frequently filter or sort on the clustered keys
  • Clustering should be reserved for large tables in the multi-terabyte range
25
Q

When choosing a clustering key, Snowflake recommends a maximum of __ or __ columns (or expressions) per key.

A

3, 4

26
Q

A __________ cardinality column would be something like a unique ID or a timestamp down to a millisecond.

A

high

27
Q

A ________ cardinality would be something like country or gender.

A

Low

28
Q

When choosing a clustering key, a cardinality that is too _____ will negatively impact pruning because the values are too unique to be grouped in micro-partitions.

A

high

29
Q

If multiple columns are chosen to be the clustering key, it is suggested that the ______ cardinality is selected first followed by the ________ cardinality.

A

lowest, highest

30
Q

Clustering Key Code Examples

A
31
Q

As _____ operations are performed on a clustered table, the data in the table might become less clustered.

A

DML

32
Q

________________ is a background process which transparently reorganizes data in the micro-partitions by the clustering key.

A

reclustering

33
Q

Initial clustering and subsequent reclustering operations consume __________ and ________ credits.

A

clustering, reclustering, compute, storage

34
Q

Clustering is recommended for large tables which do not frequently _________ and are frequently ___________.

A

change, queried

35
Q

__________ ____________ ___________ is a table level property aimed at improving the performance of selective point lookup queries.

A

Search optimization service