Virtual Warehouses Flashcards
Snowflake Cost Components
Virtual Warehouse usage
Data Storage usage
Cloud Services usage
Misc. Costs
Data transfers from one region/cloud platform to another
Compute resources for serverless features such as auto-clustering, scheduled tasks, and Snowpipe.
What is used to pay for Snowflake?
Snowflake credits
Virtual Warehouse Credit Usage
Based on the number of virtual warehouses one uses, how long they run, and their size.
Warehouses come in 10 sizes, each increasingly corresponding to the amount of compute resource’s and credits billed per hour.
Cloud Services Credit Usage
Snowflake credits are used to pay for usage of cloud services that exceed 10% of the average daily usage of compute resources.
Data Storage Credit Usage
Data storage is calculated monthly based on the average number of on-disk bytes for all data stored each day in one’s Snowflake account.
Monthly costs for storing data in Snowflake is based on a flat rate per TB. The amount per TB is based on your account type (Capacity/On Demand) and Region (US/EU).
Benefits of having separate compute/storage environments?
Instant and simple resizing of both compute and storage resources
Pausing of the data warehouse.
Concurrent and independent workloads can run without impacting eachother.
What does a virtual warehouse represent?
a number of physical nodes a user can provision to perform data warehousing tasks
How do Multi-cluster warehouses improve concurrency
Multi-cluster warehouses can help better scale (out) to accommodate for more users or more concurrent queries. Virtual multi-cluster warehouses can also better scale up/down as the number of queries fluctuate.
Warehouse Scaling Modes
Maximized
Auto Scaling
Maximized Scaling Mode
Snowflake always has all clusters online and available to ensure maximum resources are available at all times.
Auto Scaling
Snowflake starts/stops clusters as needed to dynamically manage the workload as the workload spikes/drops within a warehouse.
Snowflake Caching
When a query is executed the result is cached for a period of time (usually 24 hours). After this time period is passed the result is purged from the cache.
Caching Costs
Caching does not incur any compute costs, but does carry storage costs. When a query is run and retrieves a cached result, this is purely a retrieval of something already stored, no need for additional compute.
What parameter can be used to enable/disable cached results? And at what level can this be modified?
USE_CACHED_RESULT is a parameter that can be overwritten at the account/user/session levels.
Snowflake Cache Layers
Result Cache
Local Disk Cache
Remote Disk
Result Cache
Holds the results of every query executed in the past 24 hours, available across virtual warehouses so query results are available to all users on the system who execute the same query.
Local Disk Cache
Used to cache data USED by a SQL query (not the results themselves). When data is needed for a given query it is retrieved and cached in SSD and memory.
Remote Disk Cache
Holds the long-term storage. This level is also responsible for data resilience, even in the event of an entire data center failure.
Ways to scale warehouses in Snowflake
Scaling up/down (resizing a warehouse) Scaling out (adding clusters to a multi-cluster warehouse)
Scaling up/down improvements
Generally improves query performance and can reduce the queueing that occurs if a warehouse does not have the compute resources to process a query promptly.
Scaling out improvements
Multi-cluster warehouses can handle more concurrent queries and users.
Types of Virtual Warehouses in Snowflake
Multi-cluster
Single Cluster
Multi-cluster benefits
Multi cluster warehouses support all the same features as a single cluster warehouse and more, including: dynamic warehouse resizing, auto-suspension due to inactivity, and auto-resuming due to activity.
Resource Monitor
A resource monitor is used to monitor credit usage on user-managed virtual warehouses. Resource monitors can suspend warehouses and notify users based on credit usage.
Who can create a Resource Monitor?
Resource monitors can ONLY be created by account admins. Account admins can choose to enable other users to view/modify resource monitors using SQL though.
Credit Quota
Within a resource monitor, credit quotas specify the number of Snowflake credits allocated to monitor for a specified frequency interval. Snowflake tracks the used credits within the specified frequency interval by all warehouses assigned to the monitor. This number resets to 0 after the interval.
Do credit quotas/resource monitors account for the 10% adjustment for cloud services?
No
Monitor Level
Specifies whether a resource monitor is used to monitor credit usage for an entire account or specified warehouses. If there is no specification, it remains dormant and no monitoring is done.
Resource Monitor Schedule
The schedule set for monitoring, defaults to a month.
Parameters being Frequency, Start, and End.
Resource Monitor Actions
AKA Triggers, an action specifies a threshold (by percentage) of credit quota for a resource monitor, and an action to perform when the threshold is reached within the specified interval of a resource monitor.
Resource Monitor Action Options
- Notify & Suspend
- Notify & Suspend Immediately (doesn’t wait running queries)
- Notify
FLATTEN Function
A table function that takes a VARIANT/OBJECT/ARRAY column and produces a lateral view (inline with a table).
FLATTEN can be used to convert semi-structured data to a relational respresentation.