Snowflake Snowpro Core: Certification Exam Flashcards
Search optimization is a Database-level property applied to all the tables within the database with supported data types. (True/False)
- TRUE
- FALSE
FALSE
Search optimization is a table-level property and applies to all columns with supported data types. The search optimization service aims to significantly improve the performance of selective point lookup queries on tables. A point lookup query returns only one or a small number of distinct rows. A user can register one or more tables to the search optimization service.
Which of these system-defined roles can manage operations at the organization level?
- USERADMIN
- SYSADMIN
- ACCOUNTADMIN
- ORGADMIN
- SECURITYADMIN
ORGADMIN
role manages operations at the organizational level. More specifically, this role:
- Can create accounts in the organization.
- Can view all accounts in the organization (using SHOW ORGANIZATION ACCOUNTS) and all regions enabled for the organization (using SHOW REGIONS).
- Can view usage information across the organization.
If you recreate a pipe using CREATE OR REPLACE PIPE
command. What does happen to load history if the Snowpipe gets recreated?
- The pipe can not be recreated
- Snowflake still keeps load history
- The load history gets reset to empty
- The recreated Pipe still has tracks of the files loaded by the old Pipe
The load history gets reset to empty
When you recreate a pipe, if you do CREATE OR REPLACE PIPE
, that load history is reset to empty, so Snowflake doesn’t know which files we’ve already loaded.
Which command will list the pipes for which you have access privileges?
- LIST PIPES();
- SHOW PIPES;
- SHOW PIPES();
- DESCRIBE PIPES;
- LIST PIPES;
SHOW PIPES
Command lists the pipes for which you have access privileges. This command can list the pipes for a specified database or schema (or the current database/schema for the session), or your entire account.
The user access history can be found by querying the
- ACCOUNT_USAGE.ACCESS_HISTORY
view
- INFORMATION_SCHEMA.ACCESS_REPORT
view
- INFORMATION_SCHEMA.ACCESS_HISTORY
view
- ACCOUNT_USAGE.ACCESS_REPORT
view
snowflake.ACCOUNT_USAGE.ACCESS_HISTORY
view
Access History in Snowflake refers to when the user query reads column data and when the SQL statement performs a data write operation, such as INSERT
, UPDATE
, and DELETE
, along with variations of the COPY
command, from the source data object to the target data object. The user access history can be found by querying the snowflake.ACCOUNT_USAGE.ACCESS_HISTORY
view.
Which systems function can help find the overlap depth of a table’s micro-partitions?
- SYSTEM$CLUSTERING_WEIGHT
- SYSTEM$CLUSTERING_INFO
- SYSTEM$CLUSTERING_DEPTH
- SYSTEM$CLUSTERING_ALL
- SYSTEM$CLUSTERING_INFORMATION
SYSTEM$CLUSTERING_INFORMATION
SYSTEM$CLUSTERING_DEPTH
For example, if you have an EMPLOYEE table - you can run any of these queries to find the depth
~~~
SELECT SYSTEM$CLUSTERING_INFORMATION(‘EMPLOYEE’);
SELECT SYSTEM$CLUSTERING_DEPTH(‘EMPLOYEE’);
~~~
What would you create (UDF or Stored procedure) if you need a function that can be called as part of a SQL statement and must return a value that will be used in the statement?
- Stored Procedure
- UDF
A UDF evaluates to a value and can be used in contexts in which a general expression can be used (e.g. SELECT my_function() …).
A stored procedure does not evaluate to a value, and cannot be used in all contexts in which a general expression can be used. For example, you cannot execute SELECT my_stored_procedure()
….
The VALIDATION_MODE
parameter does not support COPY
statements that transform data during a load. (True / False)
- TRUE
- FALSE
TRUE
VALIDATION_MODE
instructs the COPY
command to validate the data files instead of loading them into the specified table; i.e., the COPY
command tests the files for errors but does not load them.
The command validates the data to be loaded and returns results based on the validation option specified:
Syntax: VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS
-
RETURN_n_ROWS
(e.g.RETURN_10_ROWS
) - Validates the specified number of rows, if no errors are encountered; otherwise, fails at the first error encountered in the rows. -
RETURN_ERRORS
- Returns all errors (parsing, conversion, etc.) across all files specified in theCOPY
statement. -
RETURN_ALL_ERRORS
- Returns all errors across all files specified in theCOPY
statement, including files with errors that were partially loaded during an earlier load because theON_ERROR
copy option was set toCONTINUE
during the load.”
Which primary tool loads data to Snowflake from a local file system?
- Snowflake UI
- External Stage
- SnowSQL
- ETL tools
SnowSQL is the primary tool used to load data to Snowflake from a local file system. You can run it in either interactive shell or batch mode.
Pour quel objet le connecteur Kafka crée-t-il un topic ?
- Une internal stage pour stocker temporairement les fichiers de données pour chaque topic
- Un pipe pour ingérer les fichiers de données pour chaque partition de topic
- Tous ces objets
- Une table pour chaque topic. Si la table spécifiée pour chaque topic n’existe pas
All of these
The connector creates the following objects for each topic:
- One internal stage to temporarily store data files for each topic.
- One pipe to ingest the data files for each topic partition.
- One table for each topic. If the table specified for each topic does not exist, the connector creates it; otherwise, the connector creates the RECORD_CONTENT
and RECORD_METADATA
columns in the existing table and verifies that the other columns are nullable (and produces an error if they are not).
How can we turn off the query result cache? Query result cache can be turned off by:
- Setting the parameter USE_QUERY_CACHED
to FALSE
- Setting the parameter USE_CACHED_INFO
to FALSE
- Setting the parameter USE_CACHED_RESULT
to FALSE
Setting the parameter USE_CACHED_RESULT
to FALSE
We can turn off the query result cache by setting the parameter USE_CACHED_RESULT
to FALSE
. Though the only reason we would really want to do this is if we are doing performance testing.
Which of these are kind of Cache in Snowflake?
- Metadata Cache
- Query Result Cache
- All of these
- Data/Local Disk Cache
All of these
Snowflake has three types of cache.
- The Metadata Cache that lives in the cloud services layer.
- The Data/Local Disk Cache that lives on the SSD drives in the virtual warehouses, and
- The Query Result Cache. If a result is small, it will be stored in the cloud services layer, but larger results are going to be stored in the storage layer.
What all options are available for data transformation while loading data into a table using the COPY command? (Select all that apply)
- Truncation of Text Strings
- Join
- Column reordering
- Column omission
- Casts
Snowflake supports transforming data while loading it into a table using the COPY
command. Options include:
- Column reordering
- Column omission
- Casts
- Truncating text strings that exceed the target column length
What are the supported file formats for data unloading in Snowflake?
- Avro
- ORC
- JSON
- Parquet
- XML
JSON, Parquet for semi-structured data.
Delimited (CSV, TSV, etc.) for structured data.
A user can be assigned multiple roles. (True / False)
- FALSE
- TRUE
TRUE
Roles are the entities to which privileges on securable objects can be granted and revoked. Roles are assigned to users to allow them to perform actions required for business functions in their organization. A user can be assigned multiple roles. It allows users to switch roles (i.e., choose which role is active in the current Snowflake session) to perform different actions using separate sets of privileges.
Quelles options de copie peuvent aider à charger un fichier dont les métadonnées ont expiré (si la date LAST_MODIFIED
est antérieure à 64 jours et que l’ensemble initial de données a été chargé dans la table plus de 64 jours auparavant (et si le fichier a été chargé dans la table, cela s’est également produit plus de 64 jours auparavant)) ? (Sélectionner 2)
LOAD_FILES = TRUE
LOAD_CERTAIN_FILES = TRUE
FORCE = FALSE
LOAD_UNCERTAIN_FILES = TRUE
FORCE = TRUE
ON_ERROR = CONTINUE
FORCE = TRUE
LOAD_UNCERTAIN_FILES = TRUE
To load files whose metadata has expired, set the LOAD_UNCERTAIN_FILES
copy option to true. The copy option references load metadata, if available, to avoid data duplication, but also attempts to load files with expired load metadata. Alternatively, set the FORCE
option to load all files, ignoring load metadata if it exists. Note that this option reloads files, potentially duplicating data in a table.
Which object parameter can users with the ACCOUNTADMIN
role use to set the default retention period for their account?
- DATA_RETENTION_TIME_IN_HOURS
- DATA_RETENTION_TIME_IN_DAYS
- DATA_RETENTION_IN_TIME_TRAVEL
- DATA_RETENTION_TIME_MAX
Users can use the DATA_RETENTION_TIME_IN_DAYS
object parameter with the ACCOUNTADMIN
role to set the default retention period for their account.
Snowflake automatically and transparently maintains materialized views. (True/False)
- TRUE
- FALSE
TRUE
Snowflake automatically and transparently maintains materialized views. A background service updates the materialized view after changes to the base table. This is more efficient and less error-prone than manually maintaining the equivalent of a materialized view at the application level.
Which of these SQL functions does Snowflake support? (Select all that apply)
- Table
- Scalar
- Window
- System
- User-Defined
- Aggregate
Snowflake Supports all these SQL functions.
- Table
- Scalar
- Window
- System
- User-Defined
- Aggregate
Monica ran a SELECT
query on a large table t1. The query took longer than expected. She looked into the query profile and found that ‘ Bytes spilled to local storage’ and ‘Bytes spilled to remote storage’ are very high. What advice will you give to her to improve the query performance? (Select 3)
- Processing data in larger batches
- Increasing the number of parallel queries running in the warehouse
- Trying to split the processing into several steps
- Using a larger warehouse (effectively increasing the available memory/local disk space for the operation)
- Processing data in smaller batches
- Trying to split the processing into several steps
- Using a larger warehouse (effectively increasing the available memory/local disk space for the operation)
- Processing data in smaller batches
When Snowflake warehouse cannot fit an operation in memory, it starts spilling (storing) data first to the local disk of a warehouse node and then to remote storage. In such a case, Snowflake first tries to temporarily store the data on the warehouse’s local disk. As this means extra IO operations, any query that requires spilling will take longer than a similar query running on similar data that is capable to fit the operations in memory. Also, if the local disk is insufficient to fit the spilled data, Snowflake further tries to write to the remote cloud storage, which will be shown in the query profile as “Bytes spilled to remote storage”.
The spilling can’t always be avoided, especially for large batches of data, but it can be decreased by:
- Reducing the amount of data processed. For example, by trying to improve partition pruning or projecting only the columns that are needed in the output.
- Decreasing the number of parallel queries running in the warehouse.
- Trying to split the processing into several steps (for example, by replacing the CTEs with temporary tables).
- Using a larger warehouse - effectively means more memory and more local disk space.
What all locations do Snowflake support for staging the data? (Select all that apply)
- Amazon S3
- Oracle Cloud Storage
- Google Cloud Storage
- Microsoft Azure Blob Storage
- Snowflake Internal - Stages
Snowflake supports loading data from files staged in any of the following locations (except Oracle Cloud Storage), regardless of the cloud platform for your Snowflake account:
- Internal (i.e. Snowflake) stages
- Amazon S3
- Google Cloud Storage
- Microsoft Azure blob storage
An account-level resource monitor overrides the resource monitor assignment for individual warehouses. (True/False)
- TRUE
- FALSE
FALSE
An account-level resource monitor does not override resource monitor assignments for individual warehouses. If either the account resource monitor or the warehouse resource monitor reaches its defined threshold and a suspend action has been defined, the warehouse is suspended.
The major benefits of defining Clustering Keys: (Select 2)
- To help in organizing small tables (<1 GB)
- To help optimize table maintenance
- To help in faster data sharing
- To help improve query performance
- To help optimize table maintenance
- To help improve query performance
Defining clustering keys for very large tables (in the multi-terabyte range) helps optimize table maintenance and query performance. Small tables are not a good candidate for clustering.
Which of these are not supported by the Search Optimization Service? (Select all that apply)
- External Tables
- Casts on table columns
- Columns defined with
COLLATE
clause - Column Concatenation
- Analytical Expressions
- Materialized Views
- External Tables
- Casts on table columns
- Columns defined with
COLLATE
clause - Column Concatenation
- Analytical Expressions
- Materialized Views
None of these are currently supported by the Search Optimization Service. Additionally, Tables and views protected by row access policies cannot be used with the Search Optimization Search.
If an account has federated authentication enabled. Can Snowflake admins still maintain user IDs and passwords in Snowflake?
- Yes
- No
Yes
With federated authentication enabled on an account, Snowflake still allows maintaining and using Snowflake user credentials (login name and password). In other words:
- Account and security administrators can still create users with passwords maintained in Snowflake.
- Users can still log into Snowflake using their Snowflake credentials.
However, if federated authentication is enabled for an account, Snowflake does not recommend maintaining user passwords in Snowflake. Instead, user passwords should be maintained solely in your IdP.
What is the expiration period of a File URL?
- Length of time specified in the expiration_time argument
- It is Permanent
- The URL expires when the persisted query result period ends
It is permanent
The expiration period of Scoped URL: The URL expires when the persisted query result period ends.
The expiration period of the File URL: It is permanent.
The expiration period of Pre-Signed URL: Length of time specified in the expiration_time argument.
Select the type of function that can operate on a subset of rows within the set of input rows.
- System Function
- Table Function
- Scalar Function
- Aggregate Function
- User-Defined Function
- Window Function
A window function is any function that operates over a window of rows. For example: ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)
Which objects are not available for replication in the Standard Edition of Snowflake? (Select 3)
- Integrations
- Roles
- Users
- Shares
- Database
- Integrations
- Roles
- Users
Database and share replication are available in all editions, including the Standard edition. Replication of all other objects is only available for Business Critical Edition (or higher).
Which schema can be used to find out about storage, compute, and objects in a Snowflake account?
- USAGE_SCHEMA
- SNOWFLAKE_SCHEMA
- RESOURCE_SCHEMA
- INFORMATION_SCHEMA
INFORMATION_SCHEMA
can be used to find out about storage, compute, and objects in a Snowflake account. Every database that you create on Snowflake has a schema called INFORMATION_SCHEMA
that’s automatically created, and inside that schema, you can find views and table functions that provide metadata information about objects in your account.
A stored procedure can simultaneously run the caller’s and the owner’s rights. (True / False)
- TRUE
- FALSE
A stored procedure runs with either the caller’s rights or the owner’s rights. It cannot run with both at the same time. A caller’s rights stored procedure runs with the privileges of the caller. The primary advantage of a caller’s rights stored procedure is that it can access information about that caller or about the caller’s current session. For example, a caller’s rights stored procedure can read the caller’s session variables and use them in a query. An owner’s rights stored procedure runs mostly with the privileges of the stored procedure’s owner. The primary advantage of an owner’s rights stored procedure is that the owner can delegate specific administrative tasks, such as cleaning up old data, to another role without granting that role more general privileges, such as privileges to delete all data from a specific table. At the time that the stored procedure is created, the creator specifies whether the procedure runs with owner’s rights or caller’s rights. The default is owner’s rights.
How long do results remain in the Query results cache?
- 12 hours
- 1 hours
- 24 hours
- 16 hours
- 31 hours
Results are retained for 24 hours in Query Result Cache. Snowflake resets the 24-hour retention period for the result, up to a maximum of 31 days from the date and time that the query was first executed. After 31 days, the result is purged and the next time the query is submitted, a new result is generated and persisted.
Which of these types of VIEW does Snowflake support? (Select 3)
- SECURE VIEW
- STANDARD VIEW
- MATERIALIZED VIEW
- TEMPORARY VIEW
- PERMANENT VIEW
- EXTERNAL VIEW
- SECURE VIEW
- STANDARD VIEW
- MATERIALIZED VIEW
Snowflake supports three types of views.
Standard View, Secure View, and Materialized View.
Standard View: It is a default view type. Its underlying DDL is available to any role with access to the view. When you create a standard view, Snowflake saves a definition of the view. Snowflake does not run the query. When someone accesses the view, that is when the query is run. The standard view will always execute as the owning role.
Secure View: The secure view is exactly like a standard view, except users cannot see how that view was defined. Sometimes a secure view will run a little slower than a standard view to protect the information in a secure view. Snowflake may bypass some of the optimizations.
Materialized View: A materialized view is more like a table. Unlike a standard or secure view, Snowflake runs the query right away when you create a materialized view. It takes the results set and stores that result set as a table in Snowflake. Because Snowflake is storing that materialized view as a table, creating micro partitions. Snowflake is creating metadata about those micro partitions. So when you query a materialized view, if you put a filter on the view, you get the same benefit of micro partition pruning that you would get from a table. With Snowflake, the materialized view is automatically refreshed every time there is a transaction against the base table. So it is always going to be in sync. If you want, you can also create a secure materialized view, which again will hide the logic from the user. A note about materialized views, because Snowflake is auto-refreshing them in the background, they use some credits, so there is a little bit of a cost there. Moreover, there is some storage, and Snowflake stores the result set as a table in Snowflake. So materialized views use more storage and compute than standard or secure views.
Which algorithm does Snowflake use to estimate the approximate number of distinct values in a data set?
- HyperEstimateLog
- HyperMeanLog
- HyerAccumulateLog
- HyperMedianLog
- HyperLogLog
HyperLogLog
Snowflake uses HyperLogLog to estimate the approximate number of distinct values in a data set. HyperLogLog is a state-of-the-art cardinality estimation algorithm, capable of estimating distinct cardinalities of trillions of rows with an average relative error of a few percent.
Which of these Snowflake Editions automatically stores data in an encrypted state?
- Enterprise
- Business Critical
- Standard
- All of the Snowflake Editions
- Virtual Private - Snowflake(VPS)
All of the Snowflake Editions (Standard, Enterprise, Business Critical, Virtual Private Snowflake) automatically store data in an encrypted state.
When deciding whether to suspend a warehouse or leave it running, what should you consider?
- Consider suspending the warehouse if the warehouse is large and there are no active queries.
- Consider the trade-off between saving credits by suspending the warehouse versus the operational cost of resuming the warehouse when needed.
- Consider the trade-off between saving credits by suspending the warehouse versus maintaining the cache of data from the previous queries to help with performance.
Consider the trade-off between saving credits by suspending the warehouse versus maintaining the cache of data from the previous queries to help with performance.
Suppose you have an auto-scaling mode setup with an Economy policy. In what situation does Snowflake spin up an additional cluster?
- The first cluster starts immediately when either a query is queued or the system detects that there’s one more query than the currently-running clusters can execute.
- Only if the system estimates there’s enough query load to keep the cluster busy for at least 6 minutes.
In the Economy Scaling policy, Snowflake spins up an additional cluster only if the system estimates there’s enough query load to keep the cluster busy for a least 6 minutes.
Which of the following file format is not supported by Snowflake?
- AVRO
- PARQUET
- EDI
- JSON
- CSV
- ORC
EDI
Snowflake supports - CSV, TSV, JSON, AVRO, ORC, PARQUET. Snowflake also supports XML which is a Preview feature as of now. EDI format is not supported by Snowflake.
The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake. (True/False)
FALSE
TRUE
TRUE
Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.
A user’s default role is
- changed each time the user logs in to Snowflake.
- the name used to log in to the Snowflake WebUI.
- the role a user gets set to each time the user logs in to Snowflake.
- always the default PUBLIC role.
the role a user gets set to each time the user logs in to Snowflake.
A user’s default role is the role a user gets set to each time the user logs in to Snowflake. Snowflake uses roles to control the objects (virtual warehouses, databases, tables, etc.) that users can access:
- Snowflake provides a set of predefined roles, as well as a framework for defining a hierarchy of custom roles.
- All Snowflake users are automatically assigned the predefined PUBLIC role, which enables login to Snowflake and basic object access.
- In addition to the PUBLIC role, each user can be assigned additional roles, with one of these roles designated as their default role.
- A user’s default role determines the role used in the Snowflake sessions initiated by the user; however, this is only a default. Users can change roles within a session at any time.
- Roles can be assigned at user creation or afterward.
Which is generally the slowest option for selecting staged data files to load from a stage?
- Specifying a list of specific files to load
- By path (internal stages) / prefix (Amazon S3 bucket)
- Using pattern matching to identify specific files by pattern
Using pattern matching to identify specific files by pattern
Pattern matching using a regular expression is generally the slowest of the three options for identifying/specifying data files to load from a stage; however, this option works well if you exported your files in named order from your external application and want to batch load the files in the same order