Virtual Warehouses Flashcards

1
Q

Snowflake Cost Components

A

Virtual Warehouse usage
Data Storage usage
Cloud Services usage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Misc. Costs

A

Data transfers from one region/cloud platform to another

Compute resources for serverless features such as auto-clustering, scheduled tasks, and Snowpipe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is used to pay for Snowflake?

A

Snowflake credits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Virtual Warehouse Credit Usage

A

Based on the number of virtual warehouses one uses, how long they run, and their size.

Warehouses come in 10 sizes, each increasingly corresponding to the amount of compute resource’s and credits billed per hour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cloud Services Credit Usage

A

Snowflake credits are used to pay for usage of cloud services that exceed 10% of the average daily usage of compute resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Storage Credit Usage

A

Data storage is calculated monthly based on the average number of on-disk bytes for all data stored each day in one’s Snowflake account.

Monthly costs for storing data in Snowflake is based on a flat rate per TB. The amount per TB is based on your account type (Capacity/On Demand) and Region (US/EU).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Benefits of having separate compute/storage environments?

A

Instant and simple resizing of both compute and storage resources

Pausing of the data warehouse.

Concurrent and independent workloads can run without impacting eachother.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a virtual warehouse represent?

A

a number of physical nodes a user can provision to perform data warehousing tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do Multi-cluster warehouses improve concurrency

A

Multi-cluster warehouses can help better scale (out) to accommodate for more users or more concurrent queries. Virtual multi-cluster warehouses can also better scale up/down as the number of queries fluctuate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Warehouse Scaling Modes

A

Maximized

Auto Scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Maximized Scaling Mode

A

Snowflake always has all clusters online and available to ensure maximum resources are available at all times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Auto Scaling

A

Snowflake starts/stops clusters as needed to dynamically manage the workload as the workload spikes/drops within a warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Snowflake Caching

A

When a query is executed the result is cached for a period of time (usually 24 hours). After this time period is passed the result is purged from the cache.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Caching Costs

A

Caching does not incur any compute costs, but does carry storage costs. When a query is run and retrieves a cached result, this is purely a retrieval of something already stored, no need for additional compute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What parameter can be used to enable/disable cached results? And at what level can this be modified?

A

USE_CACHED_RESULT is a parameter that can be overwritten at the account/user/session levels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Snowflake Cache Layers

A

Result Cache
Local Disk Cache
Remote Disk

17
Q

Result Cache

A

Holds the results of every query executed in the past 24 hours, available across virtual warehouses so query results are available to all users on the system who execute the same query.

18
Q

Local Disk Cache

A

Used to cache data USED by a SQL query (not the results themselves). When data is needed for a given query it is retrieved and cached in SSD and memory.

19
Q

Remote Disk Cache

A

Holds the long-term storage. This level is also responsible for data resilience, even in the event of an entire data center failure.

20
Q

Ways to scale warehouses in Snowflake

A
Scaling up/down (resizing a warehouse)
Scaling out (adding clusters to a multi-cluster warehouse)
21
Q

Scaling up/down improvements

A

Generally improves query performance and can reduce the queueing that occurs if a warehouse does not have the compute resources to process a query promptly.

22
Q

Scaling out improvements

A

Multi-cluster warehouses can handle more concurrent queries and users.

23
Q

Types of Virtual Warehouses in Snowflake

A

Multi-cluster

Single Cluster

24
Q

Multi-cluster benefits

A

Multi cluster warehouses support all the same features as a single cluster warehouse and more, including: dynamic warehouse resizing, auto-suspension due to inactivity, and auto-resuming due to activity.

25
Q

Resource Monitor

A

A resource monitor is used to monitor credit usage on user-managed virtual warehouses. Resource monitors can suspend warehouses and notify users based on credit usage.

26
Q

Who can create a Resource Monitor?

A

Resource monitors can ONLY be created by account admins. Account admins can choose to enable other users to view/modify resource monitors using SQL though.

27
Q

Credit Quota

A

Within a resource monitor, credit quotas specify the number of Snowflake credits allocated to monitor for a specified frequency interval. Snowflake tracks the used credits within the specified frequency interval by all warehouses assigned to the monitor. This number resets to 0 after the interval.

28
Q

Do credit quotas/resource monitors account for the 10% adjustment for cloud services?

A

No

29
Q

Monitor Level

A

Specifies whether a resource monitor is used to monitor credit usage for an entire account or specified warehouses. If there is no specification, it remains dormant and no monitoring is done.

30
Q

Resource Monitor Schedule

A

The schedule set for monitoring, defaults to a month.

Parameters being Frequency, Start, and End.

31
Q

Resource Monitor Actions

A

AKA Triggers, an action specifies a threshold (by percentage) of credit quota for a resource monitor, and an action to perform when the threshold is reached within the specified interval of a resource monitor.

32
Q

Resource Monitor Action Options

A
  • Notify & Suspend
  • Notify & Suspend Immediately (doesn’t wait running queries)
  • Notify
33
Q

FLATTEN Function

A

A table function that takes a VARIANT/OBJECT/ARRAY column and produces a lateral view (inline with a table).
FLATTEN can be used to convert semi-structured data to a relational respresentation.