Virtual Warehouses Flashcards
Snowflake Cost Components
Virtual Warehouse usage
Data Storage usage
Cloud Services usage
Misc. Costs
Data transfers from one region/cloud platform to another
Compute resources for serverless features such as auto-clustering, scheduled tasks, and Snowpipe.
What is used to pay for Snowflake?
Snowflake credits
Virtual Warehouse Credit Usage
Based on the number of virtual warehouses one uses, how long they run, and their size.
Warehouses come in 10 sizes, each increasingly corresponding to the amount of compute resource’s and credits billed per hour.
Cloud Services Credit Usage
Snowflake credits are used to pay for usage of cloud services that exceed 10% of the average daily usage of compute resources.
Data Storage Credit Usage
Data storage is calculated monthly based on the average number of on-disk bytes for all data stored each day in one’s Snowflake account.
Monthly costs for storing data in Snowflake is based on a flat rate per TB. The amount per TB is based on your account type (Capacity/On Demand) and Region (US/EU).
Benefits of having separate compute/storage environments?
Instant and simple resizing of both compute and storage resources
Pausing of the data warehouse.
Concurrent and independent workloads can run without impacting eachother.
What does a virtual warehouse represent?
a number of physical nodes a user can provision to perform data warehousing tasks
How do Multi-cluster warehouses improve concurrency
Multi-cluster warehouses can help better scale (out) to accommodate for more users or more concurrent queries. Virtual multi-cluster warehouses can also better scale up/down as the number of queries fluctuate.
Warehouse Scaling Modes
Maximized
Auto Scaling
Maximized Scaling Mode
Snowflake always has all clusters online and available to ensure maximum resources are available at all times.
Auto Scaling
Snowflake starts/stops clusters as needed to dynamically manage the workload as the workload spikes/drops within a warehouse.
Snowflake Caching
When a query is executed the result is cached for a period of time (usually 24 hours). After this time period is passed the result is purged from the cache.
Caching Costs
Caching does not incur any compute costs, but does carry storage costs. When a query is run and retrieves a cached result, this is purely a retrieval of something already stored, no need for additional compute.
What parameter can be used to enable/disable cached results? And at what level can this be modified?
USE_CACHED_RESULT is a parameter that can be overwritten at the account/user/session levels.