D2.1 - OUTLINE COMPUTING PRINCIPLES Flashcards

1
Q

Snowflake Credit Usages and Billing Model

A
  • Certain Snowflake features rely on compute resources provided by Snowflake (serverless compute model) rather than user-managed virtual warehouses.
  • Snowflake automatically resizes and scales up or down these resources as required for each workload.
  • Efficient for changing use needs which usually require continuous maintenance operations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Snowflake Managed Compute Resources (FUNCTIONAL AREAS)

A
  • Tables
  • External Tables
  • Views
  • Data loading
  • Database Replication
  • Tasks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Snowflake Managed Compute Resources (FUNCTIONAL AREAS: TABLES)

A
  • Automated Clustering:
  • Automated background maintenance of each clustered table, including initial clustering and clustering as needed
  • Search Optimization Service:
  • Automated background maintenance of the search access paths used by the search optimization service
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Snowflake Managed Compute Resources (FUNCTIONAL AREAS: EXTERNAL TABLES)

A
  • Automatically Refreshing External Table Metadata:
  • Automated refreshing of the external table metadata with the latest set of associated files in the external stage and path.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Snowflake Managed Compute Resources (FUNCTIONAL AREAS: VIEWS)

A
  • Materialized Views Maintenance:

- Automated background synchronization of each materialized view with changes in the base table for the view.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Snowflake Managed Compute Resources (FUNCTIONAL AREAS: DATA LOADING)

A
  • Snowpipe

- Automated processing of file loading requests for each pipe object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Snowflake Managed Compute Resources (FUNCTIONAL AREAS: DATABASE REPLICATION)

A
  • Database Replication and Failover/Failback:

- Automated copying of data between accounts, including initial data replication and maintenance as needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Snowflake Managed Compute Resources (FUNCTIONAL AREAS: TASKS)

A
  • Executing SQL Statements on a Schedule Using Tasks:

- Snowflake-managed compute resources provided to execute SQL code, rather than a user-managed virtual warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does Snowflake bill for serverless ?

A
  • Features that rely on Snowflake managed compute resources appear on your bill as separate line items
  • Charges for these features are calculated based on total usage of the resources (including cloud service usage) measured in compute-hours credit usage
  • One compute-hour is comparable to the computing resources utilized when running an X-Small virtual warehouse for an hour
  • In contrast, user managed virtual warehouses consume credits while running, regardless of whether they are performing any work, which may cause them to be overutilized or sit idle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Concurrency (Standard Feature)

A
  • The number of queries that can concurrently run is determined by the size and complexity of each query.
  • As queries are submitted, the warehouse calculates and reserves the compute resources needed to process each query.
  • IF the warehouse does not have enough resources available to process a query, it is queued pending the availability of resources becoming available
  • IF queries are queuing more than desired, another warehouse can be created and queries can be manually redirected to the new warehouse.
  • In addition, resizing a warehouse can enable limited scaling for query performance and queuing: warehouse resizing is primarily intended for query performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Concurrency with Multi-cluster Warehouses (Enterprise Edition Feature)

A
  • To enable fully automated scaling for concurrency, Snowflake recommends a multi-cluster warehouses
  • They provide essentially the same benefits as creating additional warehouses and redirecting queries, but without requiring manual intervention
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Caching Details

A
  • When a query is executed the result is persisted for a period of time. At the end of the time period, the result is purged from the system.
  • The security token is used to access large persisted query results expires on a shorter interval.
  • A new token can be retrieved to access results while they are still in cache.
  • The cache expires after 24 hours
  • Snowflake uses persisted query results to avoid re-generating results when nothing has changed (retrieval optimization)
  • In addition, you can use persisted query results to post process the results (layering a new query on top of the results already calculated)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the purpose of Caching?

A
  • IF a user repeats a query that has already been run, and the data in the table(s) hasnt changed since the last time the query was run, then the result of the query is the same.
  • Instead of running the query again, Snowflake simply returns the same result that it returned previously.
  • This can substantially reduce query time because Snowflake bypasses query execution and, instead, retrieves the result directly from the cache.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Query Cashing Conditions

A
  • the new query is syntactically matches the previously executed query
  • The query DOES NOT include functions that must be evaluated at execution time (e.g. CURRENT_TIMESTAMP() and UUID_STRING()). CURRENT_DATE() function is the exception to this rule
  • The query does not contain user defined functions (UDFs) or external functions
  • The table data contributing to the query has not changed
  • The persisted result for the previous query is still available
  • The role accessing the cached results has the required permissions
  • If the query was a SELECT query, the role executing the query must have the necessary access privilege’s for all the tables used in the cached query
  • if the query was a SHOW query, the role executing the query must match the role that generated the cached results
  • No changes in relevant configuration settings
  • The tables micro-partitions have not changed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Virtual Warehouse Characteristics

A
  • A virtual warehouse, often referred to simply as a ‘warehouse’, is a cluster of compute resources in Snowflake. A warehouse provides the required resources, such as CPU, memory, and temporary storage, to perform the following operations in a Snowflake session:
  • Execute SQL that require compute resources
  • Execute DML operations like updating rows, loading data, unloading data etc.
  • Required Parameters:
  • Unique Identifier for the virtual warehouse, which must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes
  • The size determines the amount of compute resources in the warehouse and, therefore, the number of credits consumed while the warehouse is running
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MAX_CLUSTER_COUNT

A
  • Specifies the maximum number of warehouses for a multi-cluster warehouse
17
Q

Virtual Warehouse: Scaling Policy

A
  • To help control credits consumed by a multi-cluster warehouse running in Auto-scale mode, snowflake provides scaling policies, which are used to determine when to start or shut down a warehouse
  • Scaling policy for multi-cluster warehouse only applied if running in Auto-Scale mode.
18
Q

Scaling Policy: Standard

A

Description: prevents/minimizes queuing by favoring starting additional warehouses over conserving credits.

  • The first warehouse starts immediately when either a query is queued or the system detects that there’s one more query than the currently-running warehouses can execute
  • Warehouse Shuts down: after 2 to 3 consecutive successful checks, which determine whether the load on the least-loaded warehouse could be redistributed to the other warehouses without spinning up the warehouses again
19
Q

Scaling Policy: Economy

A

Description: Conserves credits by favoring running warehouses fully-loaded rather than starting additional warehouses, which may result in queries being queued and taking longer to complete

  • Starts only if the system determines theres enough query load to keep the warehouse busy for at least 6 minutes
  • After 5-6 consecutive successful checks to determine whether the load on the least-loaded warehouse could be redistributed to the other warehouses without spinning up the warehouse again
20
Q

Virtual Warehouse: Auto-Suspension and Auto-Resumption

A

By default, auto suspend are and resume are enabled

  • Auto suspension takes place if the warehouse is inactive for the specified amount of time
  • Auto resumption takes place when any statement is submitted and the warehouse is the current warehouse for the session
21
Q

Virtual Warehouse: Parameters

A
  • Three types of parameters: Account, Session, Object
  • Account admins can set using: ALTER ACCOUNT
  • Admins with appropriate privilege’s can set using: CREATED USER, ALTER USER for session parameters
  • Individual users can set using ALTER SESSION for session parameters
  • Users with appropriate privilege’s can set using: CREATE and ALTER for object parameters
22
Q

Query Profiler

A
  • Query profile, available through Snowflake web interface, provides execution details for a query. For the selected query, it provides a graphical representation of the main components of the processing plan for the query, with statistics for each component, along with details and statistics for the overall query
  • Query profile provides opportunities for identifying mistakes in SQL expressions, potential performance bottlenecks, and improvement opportunities