D2.1 - OUTLINE COMPUTING PRINCIPLES Flashcards
Snowflake Credit Usages and Billing Model
- Certain Snowflake features rely on compute resources provided by Snowflake (serverless compute model) rather than user-managed virtual warehouses.
- Snowflake automatically resizes and scales up or down these resources as required for each workload.
- Efficient for changing use needs which usually require continuous maintenance operations
Snowflake Managed Compute Resources (FUNCTIONAL AREAS)
- Tables
- External Tables
- Views
- Data loading
- Database Replication
- Tasks
Snowflake Managed Compute Resources (FUNCTIONAL AREAS: TABLES)
- Automated Clustering:
- Automated background maintenance of each clustered table, including initial clustering and clustering as needed
- Search Optimization Service:
- Automated background maintenance of the search access paths used by the search optimization service
Snowflake Managed Compute Resources (FUNCTIONAL AREAS: EXTERNAL TABLES)
- Automatically Refreshing External Table Metadata:
- Automated refreshing of the external table metadata with the latest set of associated files in the external stage and path.
Snowflake Managed Compute Resources (FUNCTIONAL AREAS: VIEWS)
- Materialized Views Maintenance:
- Automated background synchronization of each materialized view with changes in the base table for the view.
Snowflake Managed Compute Resources (FUNCTIONAL AREAS: DATA LOADING)
- Snowpipe
- Automated processing of file loading requests for each pipe object
Snowflake Managed Compute Resources (FUNCTIONAL AREAS: DATABASE REPLICATION)
- Database Replication and Failover/Failback:
- Automated copying of data between accounts, including initial data replication and maintenance as needed.
Snowflake Managed Compute Resources (FUNCTIONAL AREAS: TASKS)
- Executing SQL Statements on a Schedule Using Tasks:
- Snowflake-managed compute resources provided to execute SQL code, rather than a user-managed virtual warehouse
How does Snowflake bill for serverless ?
- Features that rely on Snowflake managed compute resources appear on your bill as separate line items
- Charges for these features are calculated based on total usage of the resources (including cloud service usage) measured in compute-hours credit usage
- One compute-hour is comparable to the computing resources utilized when running an X-Small virtual warehouse for an hour
- In contrast, user managed virtual warehouses consume credits while running, regardless of whether they are performing any work, which may cause them to be overutilized or sit idle
Concurrency (Standard Feature)
- The number of queries that can concurrently run is determined by the size and complexity of each query.
- As queries are submitted, the warehouse calculates and reserves the compute resources needed to process each query.
- IF the warehouse does not have enough resources available to process a query, it is queued pending the availability of resources becoming available
- IF queries are queuing more than desired, another warehouse can be created and queries can be manually redirected to the new warehouse.
- In addition, resizing a warehouse can enable limited scaling for query performance and queuing: warehouse resizing is primarily intended for query performance
Concurrency with Multi-cluster Warehouses (Enterprise Edition Feature)
- To enable fully automated scaling for concurrency, Snowflake recommends a multi-cluster warehouses
- They provide essentially the same benefits as creating additional warehouses and redirecting queries, but without requiring manual intervention
Caching Details
- When a query is executed the result is persisted for a period of time. At the end of the time period, the result is purged from the system.
- The security token is used to access large persisted query results expires on a shorter interval.
- A new token can be retrieved to access results while they are still in cache.
- The cache expires after 24 hours
- Snowflake uses persisted query results to avoid re-generating results when nothing has changed (retrieval optimization)
- In addition, you can use persisted query results to post process the results (layering a new query on top of the results already calculated)
What is the purpose of Caching?
- IF a user repeats a query that has already been run, and the data in the table(s) hasnt changed since the last time the query was run, then the result of the query is the same.
- Instead of running the query again, Snowflake simply returns the same result that it returned previously.
- This can substantially reduce query time because Snowflake bypasses query execution and, instead, retrieves the result directly from the cache.
Query Cashing Conditions
- the new query is syntactically matches the previously executed query
- The query DOES NOT include functions that must be evaluated at execution time (e.g. CURRENT_TIMESTAMP() and UUID_STRING()). CURRENT_DATE() function is the exception to this rule
- The query does not contain user defined functions (UDFs) or external functions
- The table data contributing to the query has not changed
- The persisted result for the previous query is still available
- The role accessing the cached results has the required permissions
- If the query was a SELECT query, the role executing the query must have the necessary access privilege’s for all the tables used in the cached query
- if the query was a SHOW query, the role executing the query must match the role that generated the cached results
- No changes in relevant configuration settings
- The tables micro-partitions have not changed
Virtual Warehouse Characteristics
- A virtual warehouse, often referred to simply as a ‘warehouse’, is a cluster of compute resources in Snowflake. A warehouse provides the required resources, such as CPU, memory, and temporary storage, to perform the following operations in a Snowflake session:
- Execute SQL that require compute resources
- Execute DML operations like updating rows, loading data, unloading data etc.
- Required Parameters:
- Unique Identifier for the virtual warehouse, which must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes
- The size determines the amount of compute resources in the warehouse and, therefore, the number of credits consumed while the warehouse is running