Snowflake - Udemy - exam 1 Flashcards
Monica ran a SELECT query on a large table t1. The query took longer than expected. She looked into the query profile and found that ‘ Bytes spilled to local storage’ and ‘Bytes spilled to remote storage’ are very high. What advice will you give to her to improve the query performance? (Select 3)
- Using a larger warehouse (effectively increasing the available memory/local disk space for the operation.
- Increasing the number of parallel queries running in the warehouse.
- Processing data in smaller batches.
??. Trying to split processing into several steps
Explanation
When Snowflake warehouse cannot fit an operation in memory, it starts spilling (storing) data first to the local disk of a warehouse node, and then to remote storage. In such a case, Snowflake first tries to temporarily store the data on the warehouse’s local disk. As this means extra IO operations, any query that requires spilling will take longer than a similar query running on similar data that is capable to fit the operations in memory. Also, if the local disk is not sufficient to fit the spilled data, Snowflake further tries to write to the remote cloud storage, which will be shown in the query profile as “Bytes spilled to remote storage”.
The spilling can’t always be avoided, especially for large batches of data, but it can be decreased by: Reducing the amount of data processed.
For example, by trying to improve partition pruning, or projecting only the columns that are needed in the output. Decreasing the number of parallel queries running in the warehouse. Trying to split the processing into several steps (for example by replacing the CTEs with temporary tables). Using a larger warehouse - effectively means more memory and more local disk space.
Both external (external cloud storage) and internal (i.e., Snowflake) stages support unstructured data. (True / False)
True, both external (external cloud storage, such as, Amazon S3, Google Cloud Storage, Azure Blob Storage etc.) and internal (i.e. Snowflake) stages support unstructured data.
Which of these are kind of Cache in Snowflake?
Metadata Cache
Query Result Cache
Data / Local Disk Cache
Snowflake has three types of cache.
1. The metadata cache that lives in the cloud services layer.
2. The data cache/local disk cache that lives on the SSD drives in the virtual warehouses, and
3. The query result cache. If a result is small, it will be stored in the cloud services layer, but larger results are going to be stored in the storage layer.
Time Travel can be disabled for an account by ACCOUNTADMIN. (True/False)
False
Time Travel cannot be disabled for an account. A user with the ACCOUNTADMIN role can set DATA_RETENTION_TIME_IN_DAYS to 0 at the account level, which means that all databases (and subsequently all schemas and tables) created in the account have no retention period by default; however, this default can be overridden at any time for any database, schema, or table.
Which of the following languages does Snowflake support for writing UDFs (User-Defined Functions)?
JavaScript
Java
SQL
Python
Explanation
User-defined functions (UDFs) let you extend the system to perform operations that are not available through the built-in, system-defined functions provided by Snowflake. Snowflake currently supports the following languages for writing UDFs: Java: A Java UDF lets you use the Java programming language to manipulate data and return either scalar or tabular results. JavaScript: A JavaScript UDF lets you use the JavaScript programming language to manipulate data and return either scalar or tabular results. Python: A Python UDF lets you use the Python programming language to manipulate data and return either scalar or tabular results. SQL: A SQL UDF evaluates an arbitrary SQL expression and returns either scalar or tabular results.
A user’s default role is
the role a user gets set to each time the user logs in to Snowflake.
Explanation
A user’s default role is the role a user gets set to each time the user logs in to Snowflake. Snowflake uses roles to control the objects (virtual warehouses, databases, tables, etc.) that users can access:
Snowflake provides a set of predefined roles, as well as a framework for defining a hierarchy of custom roles.
All Snowflake users are automatically assigned the predefined PUBLIC role, which enables login to Snowflake and basic object access.
In addition to the PUBLIC role, each user can be assigned additional roles, with one of these roles designated as their default role.
A user’s default role determines the role used in the Snowflake sessions initiated by the user; however, this is only a default. Users can change roles within a session at any time.
Roles can be assigned at user creation or afterward.
Which systems function can help find the overlap depth of a table’s micro-partitions?
SYSTEM$CLUSTERING_INFORMATION
SYSTEM$CLUSTERING_DEPTH
Explanation
For example, if you have an EMPLOYEE table - you can run any of these queries to find the depth - SELECT SYSTEM$CLUSTERING_INFORMATION(‘EMPLOYEE’); SELECT SYSTEM$CLUSTERING_DEPTH(‘EMPLOYEE’);
What value will be return by the following query? SELECT * FROM TABLE(FLATTEN(input => parse_json(‘[]’))) f;
nothing will return / output of the input row will be omitted
Explanation
If you don’t specify OUTER argument with FLATTEN, it would be defaulted to FALSE. The OUTER => FALSE argument with FLATTEN omits the output of the input rows that cannot be expanded, either because they cannot be accessed in the path or because they have zero fields or entries.
What happens to the data when the retention period ends for an object?
Data is moved to Snowflake Fail-safe.
Explanation
When the retention period ends for an object, the historical data is moved into Snowflake Fail-safe. Snowflake support needs to be contacted to get the data restored from Fail-safe.
Readers accounts enable providers to share data with consumers who are not already Snowflake customers without requiring the consumers to become Snowflake Customers. Which role can create the Reader account?
ACCOUNTADMIN
Explanation
ACCOUNTADMIN role (or a role granted the CREATE ACCOUNT global privilege) only can create the Reader account.
Which command will list the pipes for which you have access privileges?
SHOW PIPES;
Explanation
SHOW PIPES Command lists the pipes for which you have access privileges. This command can list the pipes for a specified database or schema (or the current database/schema for the session), or your entire account.
Which of the Snowflake editions provides a federated authorization feature?
All of the Snowflake Editions.
All Snowflake Editions (Standard, Enterprise, Business Critical, Virtual Private Snowflake) provide Federated Authentication.
Which of these SQL functions does Snowflake support?
User-Defined
Table
Window
Scalar
Aggregate
System
There are two modes to set up a multi-cluster warehouse. Select those from the given choices.
Auto-scaling mode
Maximized mode
There are two ways to set up a multi-cluster warehouse: in maximized mode, or auto-scaling mode.
Maximized mode - You simply set your minimum equal to your maximum, and those values are something greater than one.
Auto-Scaling mode - Specify different values for the maximum and the minimum number of clusters. In this mode, Snowflake starts and stops clusters as needed to dynamically manage the load on the warehouse:
As the number of concurrent user sessions and/or queries for the warehouse increases, and queries start to queue due to insufficient resources, Snowflake automatically starts additional clusters, up to the maximum number defined for the warehouse.
Similarly, as the load on the warehouse decreases, Snowflake automatically shuts down clusters to reduce the number of running clusters and, correspondingly, the number of credits used by the warehouse.
Micro-partitioning is the on-demand feature of Snowflake. It is required to be enabled explicitly by ACCOUNTADMIN. (True / False)
FALSE
Micro-partitioning is automatically performed on all Snowflake tables. Tables are transparently partitioned using the Ordering of the data as inserted or loaded.
John is trying to load JSON data sets with a huge array containing multiple records. Considering the VARIANT data type imposed size of 16 MB, what do you recommend to John for optimally loading the data?
Enable the STRIP_OUTER_ARRAY file format option for the COPY INTO <table> command
If the data exceeds 16 MB, enable the STRIP_OUTER_ARRAY file format option for the COPY INTO <table> command to remove the outer array structure and load the records into separate table rows: copy into <table> from @~/<file>.json file_format = (type = 'JSON' strip_outer_array = true);</file>
Snowflake automatically and transparently maintains materialized views. (True/False)
TRUE
Snowflake automatically and transparently maintains materialized views. A background service updates the materialized view after changes to the base table. This is more efficient and less error-prone than manually maintaining the equivalent of a materialized view at the application level.
Which data does not fit into a predefined data model or schema?
Unstructured Data
Unstructured data is information that does not fit into a predefined data model or schema. Typically text-heavy, such as form responses and social media conversations, unstructured data encompasses images, video, and audio. Industry-specific file types such as VCF (genomics), KDF (semiconductors), or HDF5 (aeronautics) are included in this category.
If we make any changes to the original table, then
The changes do not reflect in the cloned table.
Zero-copy cloning allows us to make a snapshot of any table, schema, or database without actually copying data. A clone is writable and is independent of its source (i.e., changes made to the source or clone are not reflected in the other object). A new clone of a table points to the original table’s micro partitions, using no data storage. If we make any changes in the cloned table, then only its changed micro partitions are written to storage.
John has to create a PIPE that will be triggered for loading by calling the Snowpipe REST endpoints. What parameter does he need to specify in CREATE PIPE statement?
AUTO_INGEST = FALSE
AUTO_INGEST = TRUE enables automatic data loading. Snowpipe supports loading from external stages (Amazon S3, Google Cloud Storage, or Microsoft Azure). AUTO_INGEST = FALSE disables automatic data loading. You must make calls to the Snowpipe REST API endpoints to load data files.
What would happen if we suspend the warehouse while it is executing the SQL statement?
Only idle compute resources of the warehouse will be shut down, allowing any compute resources executing statements to continue until the statement is complete.
When we suspend a warehouse, Snowflake immediately shuts down all idle compute resources for the warehouse. However, it allows any compute resources executing statements to continue until the statements are complete. At this time, the resources are shut down, and the warehouse status changes to “Suspended”. Compute resources waiting to shut down are considered to be in “quiesce” mode.
Which services are managed by Snowflake’s cloud services layer? (Select all that apply)
Infrastructure Management
Authentication
Metadata Management
Query Processing and Optimization
Access Control
The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch.
The cloud service layer manages Authentication, Infrastructure Management, Metadata Management, Query parsing and optimization, and Access control services.
File URL is ideal for:
custom application that require access to unstructured data files
File URL: URL that identifies the database, schema, stage, and file path to a set of files. A role that has sufficient privileges on the stage can access the files. Ideal for custom applications that require access to unstructured data files.
Scoped URL: Encoded URL that permits temporary access to a staged file without granting privileges to the stage. The URL expires when the persisted query result period ends (i.e., the results cache expires), which is currently 24 hours. Ideal for use in custom applications, providing unstructured data to other accounts via a share, or for downloading and ad hoc analysis of unstructured data via Snowsight.
Pre-signed URL: Simple HTTPS URL used to access a file via a web browser. A file is temporarily accessible to users via this URL using a pre-signed access token. The expiration time for the access token is configurable. Ideal for business intelligence applications or reporting tools that need to display unstructured file contents.
Snowflake Query history page allows you to view the details of all the queries executed in the last 31 days. (T/F)
False
Snowflake Query history page allows you to view the details of all the queries executed in the last 14 days. You can query the Query_History view in Snowflake’s Account Usage schema for older queries.
Which of these types of VIEW does Snowflake support? (Select 3)
Standard View
Secure View
Materialized View
Snowflake supports three types of views.
Standard View, Secure View, and Materialized View.
Standard View: It is a default view type. Its underlying DDL is available to any role with access to the view. When you create a standard view, Snowflake saves a definition of the view. Snowflake does not run the query. When someone accesses the view, that is when the query is run. The standard view will always execute as the owning role.
Secure View: The secure view is exactly like a standard view, except users cannot see how that view was defined. Sometimes a secure view will run a little slower than a standard view to protect the information in a secure view. Snowflake may bypass some of the optimizations.
Materialized View: A materialized view is more like a table. Unlike a standard or secure view, Snowflake runs the query right away when you create a materialized view. It takes the results set and stores that result set as a table in Snowflake. Because Snowflake is storing that materialized view as a table, creating micro partitions. Snowflake is creating metadata about those micro partitions. So when you query a materialized view, if you put a filter on the view, you get the same benefit of micro partition pruning that you would get from a table. With Snowflake, the materialized view is automatically refreshed every time there is a transaction against the base table. So it is always going to be in sync. If you want, you can also create a secure materialized view, which again will hide the logic from the user. A note about materialized views, because Snowflake is auto-refreshing them in the background, they use some credits, so there is a little bit of a cost there. Moreover, there is some storage, and Snowflake stores the result set as a table in Snowflake. So materialized views use more storage and compute than standard or secure views.
Suppose you have an auto-scaling mode setup with an Economy policy. In what situation does Snowflake spin up an additional cluster?
Only if the system estimate there’s enough query load to keep the cluster busy for at least 6 minutes.
In the Economy Scaling policy, Snowflake spins up an additional cluster only if the system estimates there’s enough query load to keep the cluster busy for a least 6 minutes.
Which systems function can help find the overlap depth of a table’s micro-partitions?
SYSTEM$CLUSTERING_INFORMATION
SYSTEM$CLUSTERING_DEPTH
For example, if you have an EMPLOYEE table - you can run any of these queries to find the depth - SELECT SYSTEM$CLUSTERING_INFORMATION(‘EMPLOYEE’); SELECT SYSTEM$CLUSTERING_DEPTH(‘EMPLOYEE’);
Which of these functions helps generate the FILE URL to access the unstructured data file?
BUILD_STAGE_FILE_URL
BUILD_STAGE_FILE_URL generates a Snowflake-hosted file URL to a staged file using the stage name and relative file path as inputs. A file URL permits prolonged access to a specified file. That is, the file URL does not expire. File URL: URL that identifies the database, schema, stage, and file path to a set of files. A role that has sufficient privileges on the stage can access the files.
Select the type of function that can operate on a subset of rows within the set of input rows.
Window Function
A window function is any function that operates over a window of rows.
Which of these compression methods are supported by Snowflake? (Select all that apply)
Brotli
raw_deflate
deflate
gzip
Zstandard
bzip2
All of these are supported by Snowflake. Snowflake can automatically detect any of these compression methods except Brotli and Zstandard.
Monica wants to delete all the data from table t1. She wants to keep the table structure, so she does not need to create the table again. Which command will be appropriate for her need?
Truncate
TRUNCATE will delete all of the data from a single table. So, once Monica truncates table t1, table t1’s structure remains, but the data will be deleted. DELETE is usually used for deleting single rows of data.
How can we add a Directory table explicitly to a stage to store a catalog of staged files?
Using Create Stage command
A Directory table is not a separate database object; it stores a catalog of staged files in cloud storage. Roles with sufficient privileges can query a directory table to retrieve file URLs to access the staged files and other metadata. A directory table can be added explicitly to a stage when the stage is created (using CREATE STAGE) or later (using ALTER STAGE) with supplying directoryTableParams. directoryTableParams (for internal stages) ::= [ DIRECTORY = ( ENABLE = { TRUE | FALSE } [ REFRESH_ON_CREATE = { TRUE | FALSE } ] ) ] ENABLE = TRUE | FALSE Specifies whether to add a directory table to the stage. When the value is TRUE, a directory table is created with the stage.
Only the user who generated the scoped URL can use the URL to access the referenced file. (True/False)
True
True, only the user who generated the scoped URL can use the URL to access the referenced file. I case of File URL, any role that has sufficient privileges on the stage can access the file.
An account-level resource monitor overrides the resource monitor assignment for individual warehouses. (True/False)
False
An account-level resource monitor does not override resource monitor assignments for individual warehouses. If either the account resource monitor or the warehouse resource monitor reaches its defined threshold and a suspend action has been defined, the warehouse is suspended.
Which of these system-defined roles can manage operations at the organization level?
ORGADMIN
ORGADMIN role manages operations at the organizational level. More specifically, this role:
Can create accounts in the organization.
Can view all accounts in the organization (using SHOW ORGANIZATION ACCOUNTS) and all regions enabled for the organization (using SHOW REGIONS).
Can view usage information across the organization.
If you create a Network Policy by providing both ‘Allowed IP Addresses’ and ‘Blocked IP Addresses’, which is applied first by Snowflake while validating the access?
Blocked IP Addresses
If you provide both Allowed IP Addresses and Blocked IP Addresses, Snowflake applies the Blocked List first.
The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake. (True/False)
True
Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.
The user access history can be found by querying the
Account Usage ACCESS_HISTORY view
Access History in Snowflake refers to when the user query reads column data and when the SQL statement performs a data write operation, such as INSERT, UPDATE, and DELETE, along with variations of the COPY command, from the source data object to the target data object. The user access history can be found by querying the Account Usage ACCESS_HISTORY view.
If you drop or disable a user in Snowflake in an Okta IdP federated environment, the user can still access Snowflake login through Okta. (True/False)
False
Users who are dropped or disabled in Snowflake are still able to log into their Okta accounts, but they will receive an error message when they attempt to connect to Snowflake. You must recreate or enable the user before they can log in.
What all options are available for data transformation while loading data into a table using the COPY command? (Select that all apply)
Column omission
Casts
Truncation of Text Strings
Column Reordering
Snowflake supports transforming data while loading it into a table using the COPY command. Options include:
Column reordering
Column omission
Casts
Truncating text strings that exceed the target column length
The VALIDATION_MODE parameter does not support COPY statements that transform data during a load. (True / False)
True
Explanation
True.
VALIDATION_MODE instructs the COPY command to validate the data files instead of loading them into the specified table; i.e., the COPY command tests the files for errors but does not load them.
The command validates the data to be loaded and returns results based on the validation option specified:
Syntax: VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS
RETURN_n_ROWS (e.g. RETURN_10_ROWS) - Validates the specified number of rows, if no errors are encountered; otherwise, fails at the first error encountered in the rows.
RETURN_ERRORS - Returns all errors (parsing, conversion, etc.) across all files specified in the COPY statement.
RETURN_ALL_ERRORS - Returns all errors across all files specified in the COPY statement, including files with errors that were partially loaded during an earlier load because the ON_ERROR copy option was set to CONTINUE during the load.”