snowflake cert pro Flashcards

1
Q

Does Automatic Clustering block DML statements issues against tables while they being reclustered?

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can you suspend/resume Automatic Clustering for a clustered table?

A

TRUE, using ALTER TABLE … SUSPEND/RESUME RECLUSTER

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Automatic Clustering?

A

Automatic Clustering is the Snowflake service that seamlessly and continually manages all reclustering as needed, of clustered tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you drop a clustering key?

A

ALTER TABLE <name> DROP CLUSTERING KEY</name>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

NOTE - When adding a clustering key to a table already populated with data, not all expressions are allowed to be specified in the key. You can check whether a specific function is supported using SHOW FUNCTIONS. The output includes a column, “valid_for_clustering”, at the end of the output.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you alter a cluster key?

A

ALTER TABLE <name> CLUSTER BY (<expr1> [, <expr2> ...])</expr2></expr1></name>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What can a cluster key contain?

A
  • Base columns.
  • Expressions on base columns.
  • Expressions on paths in VARIANT columns.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you define a clustering key on a table?

A

CREATE TABLE <name> CLUSTER BY (<expr1> [, <expr2> ...])</expr2></expr1></name>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the inputs to SYSTEM$CLUSTERING_INFORMATION?

A
  • If a table has an explicit key, the function doesn’t require any input arguments other than the name of the table.
  • If a table doesn’t have an explicit cluster key (or a table has a clustering key, but you want to calculate the ratio on other columns in the table), the function takes the desired column(s) as an additional input argument.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When are original micro-partitions purged?

A

After both the Time Travel retention period and subsequent Fail-safe period have passed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Does reclustering result in storage cost?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The number of credits used during reclustering depends on what?

A
  • Size of the table.
  • Amount of data that needs to be clustered.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

NOTE - In some cases, clustering on columns used in GROUP BY or ORDER BY clauses can be helpful. However, clustering on these columns is usually less helpful than clustering on columns that are heavily used in filter or JOIN operations.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

NOTE - If you are defining a multi-column clustering key for a table, the order in which columns are specified in the CLUSTER BY is important. As a general rule, Snowflake recommends ordering the columns from lowest cardinality to highest cardinality.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

NOTE - In general, if a column (or expression) has higher cardinality, then maintaining on that column is more expensive. For example, if a fact table has a TIMESTAMP column containing main discrete values, then a clustering key could be defined on the column by casting the values to the date instead of the timestamp. This would reduce cardinality.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

NOTE - A single clustering key can contain one or more columns or expressions. For most tables, Snowflake recommends a max of 3 or 4 columns (or expressions) per key. Adding more than 3-4 columns tends to increase cost more than benefits.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the criteria needed to make a clustered key?

A
  • The table contains a large number of micro-partitions. Typically, this means that the table contains multiple TB of data.
  • The queries can take advantage of clustering. Typically, this means that one or both are true:
    • The queries are selective. In other words, the queries need to read
      only a small percentage of rows in the table.
    • The queries sort the data.
  • A high percentage of the queries can benefit from the same clustering key(s). In other words, many/most queries select on, or sort on, the same few column(s).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does Snowflake recommend when prioritizing keys?

A
  1. Cluster columns that are most actively used in selective filters. For many fact tabled involved in date-based queries, choosing the date columns is a good fit. For event tables, event type might by a good choice, if there are a large number of different event types.
  2. If there is room for additional cluster keys, then consider columns frequently used in JOIN predicates.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

NOTE - Queries benefit from clustering when the queries filter or sort on the clustering key for the table. Sorting is commonly done for ORDER BY operations, for GROUP BY operations, and for some joins. The more frequently a table is queried the more benefit clustering provides. However, the more frequently a table changes, the more expensive it will to keep it clustered. Generally most cost effective for tables that are queried frequently and do no change frequently.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Does maintaining clustering keys consume credits?

A

The compute resources used to perform clustering consumes credits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Can clustering keys be defined on hybrid tables?

A

Clustering keys cannot be defined for hybrid tables. In hybrid tables, data is always ordered by primary key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which indicators can be used to determine if a table should be clustered?

A
  • Queries on the table are running slower than expected or have noticeably degraded over time.
  • The clustering depth for the table is large.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a clustering key?

A

A clustering key is a subset of columns in a table (or expressions on a table) that are explicitly designated to co-locate the data in the table in the same micro-partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

NOTE - Clustering keys are not intended for all table due to the costs of initially clustering the data and maintaining the clustering.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Which system functions can you view/monitor the clustering metadata for a table?

A
  • SYSTEM$CLUSTERING_DEPTH
  • SYSTEM$CLUSTERING_INFORMATION (included clustering depth)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the purposes for clustering depth?

A
  • Monitoring the clustering “health” of a large table, particularly over time as DML is performed on the table.
  • Determining whether a large table would benefit from defining a clustering key.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is clustering depth?

A

The clustering depth for a populated tables measures the average depth (1 or greater) of the overlapping micro-partitions for specified columns in a table. The smaller the average depth, the better clustered the table is with regards to the specified columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What clustering information is maintained for micro-partitions?

A
  • The total number of micro-partitions that comprise the table.
  • The number of micro-partitions containing values that overlap with each other.
  • The depth of the overlapping micro-partitions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is query pruning?

A

Refers to the process of simplifying or modifying a query to reduce unnecessary computations, eliminate redundant operations, or narrow down the scope of data being retrieved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the benefits of micro-partitions?

A
  • In contrast to traditional static partitioning, Snowflake micro-partitions are derived automatically; no maintenance needed by users.
  • Small (50 MB - 500 MB) - enable fast DML commands.
  • Micro-partitions can overlap in their range of values, which combined with their small size, help prevent skew.
  • Columns are stored independently. This enables efficient scanning; only the columns referenced by a query are scanned.
  • Columns are also compressed individually within micro-partitions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

When automatically partitioned, what is used to partition?

A

Tables are transparently partitioned using the order of the data as it is inserted/loaded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What metadata is stored about all rows in a micro-partitions?

A
  • The range of values for each of the columns in the micro-partitions.
  • The number of distinct values.
  • Additional properties used for both optimization and efficient query processing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is a micro-partition?

A

All data in Snowflake tables is automatically divided into micro-partitions. Each contains between 50 MB to 500 MB of uncompressed data. Groups of rows in tables are mapped to individual micro-partitions organized in a columnar fashion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Which SQL commands are supported by Query Acceleration Service?

A
  • SELECT
  • INSERT
  • CREATED TABLE AS SELECT
  • COPY INTO <table>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

By default, what does a multi-cluster warehouse consist of?

A

A single cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Which are the services managed by the cloud services layer?

A
  • Authentication
  • Infra Management
  • Metadata Management
  • Query Parsing and Optimization
  • Access Control
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the default warehouse for notebooks?

A

A dedicated Snowflake-managed warehouse with the name SYSTEM$STREAMLIT_NOTEBOOK_WH is automatically provisioned in each account for running Notebook. This warehouse is owned and managed by Snowflake under the SYSTEM role. You cannot DROP or ALTER the warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Which are the scaling policies available when a multi-cluster warehouse is running in Auto-scale mode?

A
  • Standard (default)
  • Economy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the Standard scaling policy?

A

Prevent/minimized queueing by favoring starting additional clusters over conserving credits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Can Snowflake be run on private cloud infra (on-premise or hosted)?

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What determines the number of queries a warehouse can concurrently process?

A

The size and complexity of each query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

A multi-cluster warehouse is defined by specifying which properties for size?

A
  • Maximum number of clusters, greater than 1 (up to 10).
  • Minimum number of clusters, equal to or less than max (up to 10).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Which properties can be set when defining a multi-cluster warehouse?

A
  • Specifying a warehouse size.
  • Resizing a warehouse at any time.
  • Auto-suspending a running warehouse due to inactivity; This does not apply to individual clusters, but rather the entire multi-cluster warehouse.
  • Auto-resuming a suspended warehouse when new queries are submitted.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are the modes a multi-cluster can run in?

A
  • Maximized
  • Auto-scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How is data reorganized when loaded into Snowflake?

A
  • Compressed.
  • Columnar format.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are the object-level parameters that Snowflake provided to help control query processing and concurrency?

A
  • STATEMENT_QUEUED_TIMEOUT_IN_SECONDS
  • STATEMENT_TIMEOUT_IN_SECONDS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

NOTE - If queries are queueing more than desired, another warehouse can be created and queries can be manually redirected to the new warehouse. In additional resizing a warehouse can enable limited scaling for query concurrency and queueing; However, warehouse resizing is primarily intended for query performance.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are the properties of a warehouse running in Auto-scale mode?

A

This mode is enabled by specifying different values for max and min.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is the Economy scaling policy?

A

Conserves credits by favoring keeping running clusters fully-loaded rather than starting additional clusters, which may result in queries being queued and taking longer to complete.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How can you identify and warehouses that might benefit from Query Acceleration Service?

A

You can query QUERY_ACCELERATION_ELIGIBLE view. You can use the SYSTEM$ESTIMATE_QUERY_ACCELERATION functions to assess whether a specific query is eligible for acceleration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is Snowflakes’s architecture a combo of?

A

A hybrid of traditional shared-disk and shared-nothing architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How is multi-cluster billing calculated?

A
  • Warehouse size.
  • The number of clusters that run in that period.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is available to control the usage of credits in Auto-scale mode?

A

Snowflake provided a property SCALING_POLICY that determines the scaling policy to use when automatically starting or shutting down additional clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

The amount of compute resources in each cluster is determined by what?

A

Warehouse size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Does increasing the size of a warehouse always improve data loading performance?

A

Increasing the size of a warehouse does not always improve data loading performance. Data loading performance is influenced more by the number of files being loaded (and the size of each file) than the size of the warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

NOTE - To enable fully automated scaling for concurrency, Snowflake recommends multi-cluster warehouses.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is a virtual warehouse?

A

A cluster of computer resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What are three key layers in Snowflake?

A
  • Cloud Services (Service)
  • Query Processing (Compute)
  • Database Storage (Storage)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What are the two types of virtual warehouses?

A
  • Standard
  • Snowpark-optimized
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

NOTE - Unless you are bulk loading a large number of files concurrently (i.e. hundred or thousands of files), a smaller warehouse (S, M, L) is generally sufficient.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Can warehouses be started at any time?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What does Snowflake use for persisted data?

A

A central data repository for persisted data that is accessible from all compute nodes. But, similar to shared-nothing architecture, Snowflake processes queries using MPP (massively parallel processing) compute cluster where each node in the cluster stores a portion of the entire data set locally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What are the warehouse size options?

A
  • x-small
  • small
  • medium
  • large
  • x-large
  • 2x-large
  • 3x-large
  • 4x-large
  • 5x-large
  • 6x-large
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What are the keys to using warehouses effectively and efficiently?

A
  • Experiment with different types of queries and different warehouse sizes to determine the combo that best meets your query needs and workload.
  • Don’t focus on warehouse size, Snowflake utilizes per-second billing, so you can run larger warehouses and simply suspend them when not in use.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Which SQL command can be used to create a multi-cluster warehouse?

A

Execute a CREATE WAREHOUSE command with:
- MAX_CLUSTER_COUNT
- MIN_CLUSTER_COUNT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Which command can be used to view warehouses including multi-cluster?

A

SHOW WAREHOUSES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

How are credits charged for warehouses?

A
  • Warehouse size.
  • Number of clusters.
  • The length of time the compute resources in each cluster run.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

How does warehouse caching impact queries?

A

Each warehouse, when running, maintains a cache of table data accessed as queries are processed by the warehouse. The larger the warehouse the larger the cache. The cache is dropped when the warehouse is suspended, which might result in slower initial performance. Tradeoff between suspending warehouse and speed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

How do you select an initial warehouse size?

A
  • For data loading, the warehouse size should match the number of files being loaded and the amount of data of each file.
  • For queries in small-scale testing environments, smaller warehouse sizes may be sufficient.
  • For queries in larger-scale production environments, larger warehouses sizes may be most cost effective.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

What are the effects of resizing a suspended warehouse?

A

Resizing a suspended warehouse does not provision any new compute resources for the warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Can Snowflake be packaged and installed?

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

What is the SNOWFLAKE database?

A

Snowflake provides a system-defined, read-only shared database named SNOWFLAKE that contains metadata and historical usage data about the objects in your organization and accounts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

When can a warehouse be resized?

A

They can be resized at any time, even while running, to accommodate the need for more or less compute resources, based on the type of operations being performed by the warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

What is the Snowflake data lifecycle?

A

Organizing Data
(DDL) CREATE/ALTER DATABASE
(DDL) CREATE/ALTER SCHEMA
(DDL) CREATE/ALTER TABLE
Storing Data
(DML) INSERT/INTO TABLE
Querying Data
(DDL) SELECT FROM <table>
Working with Data
(DML) UPDATE <table>
(DML) MERGE INTO <table>
(DML) DELETE FROM <table>
(DDL) CREATE TABLE/SCHEMA/DATABASE … CLONE
Removing Data
(DDL) TRUNCATE TABLE
(DDL) DROP TABLE
(DDL) DROP SCHEMA
(DDL) DROP DATABASE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What are the four sections in Snowsight?

A
  • Navigation menu
  • Search
  • Quick actions
  • Recently viewed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

What would be the credit consumption if a 3x-Large multi-cluster warehouse runs 1 cluster for one full hour and then runs 2 clusters for the next full hour?

A

The total number of credits would be (192 = 64 + 128)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

What are the properties of a warehouse running in Maximized mode?

A

This mode is enabled by specifying the same value for both max and min of clusters. This mode is effective for statically controlling the available compute resources, particularly if you have large number of concurrent user sessions and/or queries and the numbers don’t fluctuate significantly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

What is the cloud services layer in Snowflake?

A

A collection of services that coordinate activities across Snowflake.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

What does it mean that Snowflake is a truly self-managed service?

A
  • There is no hardware (virtual or physical) to select, install, configure, or manage.
  • There is virtually no software to install, configure, or manage.
  • Ongoing maintenance, management, upgrades and tuning are handled by Snowflake.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

Can Snowflake client (SnowQL, JDBC driver, ODBC driver, python connectors, etc) have a default warehouse?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

How does auto-suspend and auto-resume apply to multi-cluster warehouses?

A
  • Auto-suspend only occurs when the minimum number of cluster is running and there is no activity for the specified period of time.
  • Auto-resume only applies when the entire warehouse is suspended (i.e. no clusters are running).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

Can the size of a warehouse impact query processing?

A

The size of a warehouse can impact the amount of time required to execute queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Can a default warehouse be set per user?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

What does Snowflake use to process queries?

A

Virtual warehouses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

Can a warehouse be automatically suspended and resumed?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Which are the ways to connect to Snowflake?

A
  • Web-base UI.
  • Command line clients (SnowSQL).
  • ODBC and JDBC.
  • Native connectors.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

When a session is initiated in Snowflake, does the session have a default warehouse?

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

What are the schemas in the SNOWFLAKE database?

A
  • ACCOUNT_USAGE
  • ALERT
  • CORE
  • DATA_PRIVACY
  • DATA_SHARING_USAGE
  • INFORMATION_SCHEMA
  • LOCAL
  • ML
  • MONITORING
  • NOTIFICATION
  • ORGANIZATION_USAGE
  • READER_ACCOUNT_USAGE
  • TELEMETRY
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

What type of billing does Snowflake use?

A

Per-second billing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

How is query load calculated?

A

Query load is calculated by dividing the execution time (in seconds) of all queries in an interval by the total time (in seconds) for the interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

What are the configuration option for Snowpark-optimized warehouse?

A

Memory (up to). CPU Architecture. Min warehouse size required
16 GB. Default or x86 XS
256 GB Default or x86 M
1 TB Default or x86 L

84
Q

When should you use Snowpark-optimized warehouses?

A

Recommended for running code, and workloads that have large memory requirements or dependencies on a specific CPU architecture. Example workloads include ML training use cases.

85
Q

What does Snowpark-optimized warehouses let you configure?

A

The available memory resources and CPU architecture on a single-node instance.

86
Q

How can you define a Snowpark-optimized warehouse in SQL?

A

CREATE OR REPLACE WAREHOUSE snowpark_opt_wh WITH
WAREHOUSE_SIZE = ‘LARGE’
WAREHOUSE_TYPE = ‘SNOWPARK_OPTIMIZED’
RESOURCE CONSTRAINT = ‘MEMORY_16X_X86’ // optional

87
Q

Which view can you use to evaluate warehouse performance?

A

QUERY_HISTORY

88
Q

What is the Query Acceleration Service?

A

QAS can accelerate parts of the query workload in a warehouse. When it is enabled, it can improve overall warehouse performance by reducing the impact of outlier queries, which are queries that use more resources than the typical query. It does this by offloading portions of the query processing work to shared compute resources provided by the service.

89
Q

How can you enable Query Acceleration Service for a warehouse?

A

CREATE WAREHOUSE my_wh WITH
ENABLE_QUERY_ACCELERATION = true;

90
Q

What are the common reasons that queries are ineligible for Query Acceleration Service?

A
  • There are not enough partitions in the scan.
  • Even if a query has a filter, the filter may not be selective enough. Alternatively, if the query has an aggregation with GROUP_BY, the cardinality of the GROUP BY might be too high for eligibility.
  • The query includes a LIMIT clause but does not have a ORDER BY clause.
  • The query includes functions that return nondeterministic results (RANDOM).
91
Q

Which queries are eligible for the Query Acceleration Service?

A

In general, queries are eligible because they have a portion of the query plan that can be ran in parallel using QAS computer resources
- Large scans with an aggregation or selective filter.
- Large scans that insert many new rows.

92
Q

A warehouse provides the required resources to perform what operations?

A
  • Execute SQL SELECT.
  • Perform DML operations, such as:
    • Updating rows
    • Loading data
    • Unloading data
93
Q

NOTE - In most cases, no tasks are required to enable Automatic Clustering for a table. You simply define a clustering key for the table. However, the rule does not apply to tables created by cloning (CREATE TABLE … CLONE …) from a source table that has clustering keys. The new table starts with Automatic Clustering suspended.

94
Q

Which privileges do you need on the schema and database to add clustering to a table?

A
  • USAGE
  • OWNERSHIP
95
Q

How can you determine if Automatic Clustering is enabled for a table?

A
  • SHOW TABLES command.
  • TABLES view (in the Snowflake Information Schema).
  • TABLES view (in the Account Usage shared database).
96
Q

How can you SUSPEND/RESUME reclustering?

A

ALTER TABLE <name> SUSPEND/RESUME RECLUSTER;</name>

97
Q

What can Automatic Clustering cost be broken down to?

A
  • Compute costs
  • Storage costs
98
Q

Does Automatic Clustering require you to provide a virtual warehouse?

99
Q

Which function can you use to help estimate the compute cost of enabling Automatic Clustering for a table and maintaining the table in a well-clustered state?

A

SYSTEM$ESTIMATE_AUTOMATIC_CLUSTERING_COSTS

100
Q

What is the AUTOMATIC_CLUSTERING_HISTORY function used for?

A

This table function is used for querying the Automatic Clustering history for given tables within a specified date range. The information returned by the function includes the credits consumed, bytes updated, and rows updated each time a table is reclustered.

101
Q

What is a stage?

A

A stage specifies where data files are stored so that the data in the files can be loaded into a table.

102
Q

Which types of internal stages are supported?

A
  • User
  • Table
  • Named
103
Q

Is each user and table in Snowflake automatically allocated an internal stage?

104
Q

What steps are required in the data loading process that involve file staging information?

A
  1. You must specify an internal stage in the PUT command when uploading files.
  2. You must specify the same stage in the COPY INTO <table> command when loading data into a table from the stage files.
105
Q

What is a user stage?

A

Each user has a stage allocated to them by default for storing files. This stage is convenient option if your files will only be accessed by a single user, but need to be copied into multiple tables.

106
Q

What are the characteristics and limitations of user stages?

A
  • User stages are referenced by using “@~”
  • Unlike named stages, user stages cannot be altered or dropped.
  • User stages do not support setting file format options. Instead, you must specify file format and copy options as part of the COPY INTO <table> command.
107
Q

When are user stages not appropriate?

A
  • Multiple users require access to the files.
  • The current user does not have INSERT privileges on the tables the data will be loaded into.
108
Q

What is a table stage?

A

By default, each table has a Snowflake stage allocated to it for storing files.

109
Q

When would you use a table stage?

A

You might use a table stage if you only need to copy files into a single table, but want to make the files accessible to multiple users.

110
Q

What are the characteristics and limitations of table stages?

A
  • A table stage has the same name as the table. For example, a table named mytable has a stage referenced as @%mytable.
  • A table stage is an implicit stage tied to a table object. It’s not a separate database object. As a result, a table stage has no grantable privileges of its own. A table stage is also not appropriate if you need to copy file data into multiple tables.
  • To stage files on a table sage, list the files, query the files, or drop the, you must be the table owner (have the role with OWNERSHIP privilege on the table).
  • Unlike a named stage, you can’t alter or drop the table stage.
    -Table stages don’t support transforming data while loading it.
111
Q

What is a named stage?

A

Named stages are database objects that provide the greatest degree of flexibility for data loading:

  • Users with the appropriate privileges on the stage on the stage can load data into any table.
  • Because the stage is a database object, the security/access rules that apply to all objects apply. The privileges to use a stage can be granted or revoked from roles. In addition, ownership of the stage can be transferred to another role.
112
Q

How can you define a stage using SQL?

A

CREATE STAGE my_stage
ENCRYPTION = (TYPE = ‘SNOWFLAKE_SSE’)

113
Q

How do you load data into a user stage?

A

Uploads a file named data.csv in the /data directory on your local machine to your user stage and prefixes the file with a folder named staged

Linux or macOS: PUT file:///data/data.csv @~/staged;
Windows: PUT file://C:\data\data.csv @~/staged;

114
Q

How do you load data into a table stage?

A

Uploads a file named data.csv in the /data directory on your local machine to the stage for a table named mytable

Linux or macOS: PUT file:///data/data.csv @%mytable;
Windows: PUT file://C:\data\data.csv @%mytable;

115
Q

How do you load data into a named stage?

A

Uploads a file named data.csv in the /data directory on your local machine to a named internal stage called my_stage

Linux or macOS: PUT file:///data/data.csv @my_stage;
Windows: PUT file://C:\data\data.csv @my_stage;

116
Q

How can you view the files that have been uploaded to a user stage using SQL?

117
Q

How can you view the files that have been uploaded to a table stage using SQL?

A

LIST @%mytable; // table name is mytable

118
Q

How can you view the files that have been uploaded to a named stage using SQL?

A

LIST @my_stage; // stage is named my_stage

119
Q

Which SQL command is used to load staged data into a target table?

A

COPY INTO <table>

120
Q

How can you validate data loads into a target table?

A

To validate data in an uploaded file, execute COPY INTO <table> in validation mode using the VALIDATION_MODE parameter. The VALIDATION_MODE parameter returns any errors that it encounters in a file.

121
Q

What does the ON_ERROR copy option in COPY INTO <table> do?

A

Indicates what action to perform if errors are encountered in a file during loading.

122
Q

What metadata is maintained for each file uploaded into a stage?

A
  • File name
  • File size
  • LAST MODIFIED
123
Q

Which command can be used to view the status of data files than have been staged?

124
Q

Which function can be used to validate the data files you’ve loaded and retrieve any errors encountered during the load?

125
Q

Which view can be used to retrieve the history of data loaded into tables using the COPY INTO command?

A

LOAD_HISTORY in the Information schema

126
Q

How can staged files be deleted?

A
  • Files that were loaded successfully can be deleted from the stage during a load by specifying the PURGE copy option in the COPY INTO <table> command.
  • After the load completes, use the REMOVE command to remove the files in the stage.
127
Q

Does removing files from a stage improve load performance?

A

TRUE - Removing files ensures they aren’t inadvertently loaded again. It also improves load performance, because it reduces the number of files that COPY commands must scan to verify whether existing files in a stage were loaded already.

128
Q

Can you copy files from one stage to another?

A

TRUE - you use the COPY FILES command and specify a source stage and a target stage

129
Q

What the difference between internal and external stages?

A

Internal stages are managed by Snowflake, while external stages are managed by your business entity.

130
Q

Which cloud providers can host external stages?

A
  • AWS
  • GCP
  • Azure
131
Q

What are the types of external stages?

A

Only named external stages are available.

132
Q

What does bulk loading rely on?

A

On user-provided virtual warehouses, which are specified in the COPY statement. Users are required to size the warehouse appropriately to accommodate expected loads.

133
Q

What is Snowpipe?

A

A feature that is designed to load small volumes of data (i.e. micro-batches) and incrementally make them available for analysis.

134
Q

Which compute resources does Snowpipe use?

A

Snowpipe uses compute resources provided by Snowflake (i.e. a serverless compute model). These resources are automatically resized and scaled up or down as required, and are charged and itemized using per-second billing. Data ingestion is charged based upon the actual workloads.

135
Q

What is the Snowpipe Streaming API?

A

The Snowpipe Streaming API writes rows of data directly to Snowflake tables without the requirement of staging files. This architecture results in lower load latencies with corresponding lower costs for loading any volume of data, which makes it a powerful tool for handling near real-time data stream.

136
Q

What can you use to load data from Kafka topics?

A

Snowflake Connector for Kafka

137
Q

Which SQL functions can be used to automatically detect the schema in a set of stage semi-structured data files?

A
  • INFER_SCHEMA
  • GENERATE_COLUMN_DESCRIPTION
138
Q

What semi-structured formats can be used with the COPY INTO <table> command?

A
  • Apache Parquet
  • Apache Avro
  • ORC
  • JSON
  • CSV
  • XML (preview)
139
Q

How can you create a table or external table using a derived schema from INFER_SCHEMA?

A

CREATE TABLE … USING TEMPLATE

CREATE EXTERNAL TABLE … USING TEMPLATE

140
Q

Can you query data in cloud storage without loading it into Snowflake tables?

141
Q

How can you query data in cloud storage without loading it into Snowflake tables?

A

External tables.

142
Q

Which file formats can be used with the COPY INTO <table> command?

A
  • Delimited files
  • Semi-structured
  • Unstructured
143
Q

Does Snowflake support loading data from tar (tape archive) files?

144
Q

Can you stage uncompressed and already-compressed files?

145
Q

Are files stored on internal stages for data loading and unloading automatically encrypted?

A

TRUE - encrypted using AES-256

146
Q

How is encryption handled when loading already-encrypted files into a stage?

A

The key used to encrypt the files must be provided to Snowflake.

147
Q

Which keys are supported for unencrypted files?

A

128-bit and 256-bit

148
Q

How can you optimize the number of parallel data loads?

A

Aim to generate data files with a compressed size of approximately 100 to 250 MB (or larger). Aggregate smaller files to minimize the processing overhead for each file.

149
Q

Can the number of load operations that run in parallel exceed the number of data files loaded?

150
Q

Is loading very large files (e.g. 100 GB or larger) recommended?

151
Q

If you must load a large file, which copy option should you use?

A

ON_ERROR - Aborting or skipping a file due to a small number of errors could result in delays and wasted credits. In addition, if a data loading operation continues beyond the maximum allowed duration of 24 hours, it could be aborted without any portion of the file being committed.

152
Q

Which column types can you not load an object larger than 16 MB?

A
  • VARCHAR
  • BINARY
  • VARIANT
  • OBJECT
  • ARRAY
  • GEOGRAPHY
  • GEOMETRY
153
Q

How can you load a large JSON file into separate rows?

A

Enable the STRIP_OUTER_ARRAY file format option for the COPY INTO <table> command to remove the outer array structure and load the records into separate table rows.

154
Q

How can you load a large JSON object from a Parquet file?

A

Use FLATTEN

CREATE OR REPLACE TABLE mytable AS
SELECT
t1.$1:ID AS id,
t1.$1:CustomerDetails:RegistrationDate::VARCHAR AS RegistrationDate,
t1.$1:CustomerDetails:FirstName::VARCHAR AS First_Name,
t1.$1:CustomerDetails:LastName::VARCHAR AS as Last_Name,
t2.value AS Event
FROM @json t1,
TABLE(FLATTEN(INPUT => $1:CustomerDetails:Events)) t2;

155
Q

How should you prepare delimited text files?

A
  • Use the ENCODING file format option to specify the character set.
  • Fields that contain delimit character should be enclosed in quotes (single or double).
  • If the data contains single or double quotes, then those must be escaped.
  • Fields that contain carriage returns should also be enclosed in quotes.
  • The number of columns in each row should be consistent.
156
Q

Which elements are not extracted into a column?

A
  • Elements that contain even a single “null” value are not extracted into a column.
  • Elements that contain multiple data types.
157
Q

Should you dedicate separate warehouses to load and query operations?

A

TRUE - Loading large data sets can affect query performance. It is recommended to allocate separate warehouses for loading and querying operations to optimize performance for both tasks.

158
Q

Which options are supported when loading data files from a stage when using the COPY command?

A
  • By path (internal stages) / prefix (s3 bucket).
  • Specify a list of specific files to load.
  • Using pattern matching to identify specific files.
159
Q

Which service should you use if your workload consists of highly concurrent COPY statements loading data into the same table?

160
Q

When does load metadata expire?

A

After 64 days.

161
Q

How can you load data whose metadata has expired?

A

To load files whose metadata has expired, set the LOAD_UNCERTAIN_FILES copy option to true. The copy option references load metadata, if available, to avoid data duplication, but also attempts to load files with expired load metadata.

Alternatively, set the FORCE option to load all files, ignoring load metadata if it exists. Note that this option reloads files, potentially duplicating data in a table.

162
Q

In a VARIANT column, how are NULL values stored?

A

Stored as a string containing the word “null,” not the SQL NULL value.

163
Q

When is it recommended to remove loaded files?

A

When data from staged files is loaded successfully, consider removing the staged files to ensure the data is not inadvertently loaded again (duplicated).

Do not remove the staged files until the data has been loaded successfully. To check if the data has been loaded successfully, use the COPY_HISTORY command. Check the STATUS column to determine if the data from the file has been loaded. Note that if the status is Load in progress, removing the staged file can result in partial loads and data loss.

Cleaning up also improves load performance, because it reduces the number of files that COPY commands must scan to verify whether existing files in a stage were loaded already.

164
Q

How can staged files can be deleted from a Snowflake stage (user stage, table stage, or named stage)?

A
  • Files that were loaded successfully can be deleted from the stage during a load by specifying the PURGE copy option in the COPY INTO <table> command.
  • After the load completes, use the REMOVE command to remove the files in the stage.
165
Q

Will Snowflake automatically determine the file and codec compression method for your data files?

166
Q

Which view can be used to review the data loading activity that has occurred over the last 365 days for all tables?

A

The COPY_HISTORY view in the ACCOUNT_USAGE schema of the SNOWFLAKE database.

167
Q

What is the INFORMATION_SCHEMA?

A

Each database created in your account automatically includes a built-in, read-only schema named INFORMATION_SCHEMA. The schema contains the following objects:

  • Views for all the objects contained in the database, as well as views for account-level objects (i.e. non-database objects such as roles, warehouses, and databases)
  • Table functions for historical and usage data across your account.
168
Q

What is a schema?

A

A logical grouping of database objects (tables, views, etc.). Each schema belongs to a single database.

169
Q

What is the default table type?

A

Permanent.

170
Q

What are the available table types?

A
  • Permanent
  • Temporary
  • Transient
  • External
  • Dynamic
  • Hybrid
  • Iceberg
171
Q

What is a temporary table?

A

A table used for storing non-permanent, transitory data (e.g. ETL data, session-specific data).

172
Q

Do temporary tables only exist within the session in which they were created?

A

TRUE - Temporary tables only exist within the session in which they were created and persist only for the remainder of the session.

173
Q

Are temporary tables visible to other users and session?

174
Q

What happens to the data in a temporary table when the session ends?

A

Once the session ends, data stored in the table is purged completely from the system and, therefore, is not recoverable, either by the user who created the table or Snowflake.

175
Q

Can you create temporary and non-temporary tables with the same name within the same schema?

A

TRUE - the temporary table takes precedence in the session over any other table with the same name in the same schema. This can lead to potential conflicts and unexpected behavior, particularly when performing DDL on both temporary and non-temporary tables.

176
Q

How can you define a temporary table using SQL?

A

CREATE TEMPORARY TABLE mytemptable (id NUMBER, creation_date DATE);

177
Q

What is a transient table?

A

A table that persists data until explicitly dropped and are available to all users with the appropriate privileges.

178
Q

What is the key difference between a permanent and transient tables?

A

Transient tables do not have a Fail-safe period. As a result, transient tables are specifically designed for transitory data that needs to be maintained beyond each session (in contrast to temporary tables), but does not need the same level of data protection and recovery provided by permanent tables.

179
Q

How is data stored when a transient table is created from a clone of a permanent table?

A

Snowflake creates a zero-copy clone. This means when the transient table is created, it utilizes no data storage because it shares all of the existing micro-partitions of the original permanent table. When rows are added, deleted, or updated in the clone, it results in new micro-partitions that belong exclusively to the clone (in this case, the transient table).

180
Q

Can you create a transient database and schema?

A

TRUE - All tables created in a transient schema, as well as all schemas created in a transient database, are transient by definition.

181
Q

What is an external table?

A

An external table is a Snowflake feature that allows you to query data stored in an external stage as if the data were inside a table in Snowflake. The external stage is not part of Snowflake, so Snowflake does not store or manage the stage.

182
Q

Are external tables read-only?

A

TRUE - You cannot perform data manipulation language (DML) operations on them.

183
Q

Can you use external tables for query and join operations?

184
Q

Can you create views against external tables?

185
Q

How can you improve the query performance of an external table?

A

Use a materialized view based on an external table.

186
Q

Which columns do all external tables include?

A
  • VALUE
  • METADATA$FILENAME
  • METADATA$FILE_ROW_NUMBER
187
Q

Can you partition an external table?

A

TRUE - automatically and manually

188
Q

What is a dynamic table?

A

Dynamic tables in Snowflake simplify data engineering by automating data transformations. Instead of manually managing tasks and schedules, you define the end result with dynamic tables, and Snowflake handles the pipeline. These tables reflect query results directly, eliminating the need for separate target tables or custom code. The content is updated automatically through scheduled refreshes, and cannot be modified via DML operations.

189
Q

When are dynamic tables best used?

A
  • You don’t want to write code to track data dependencies and manage data refresh.
  • You don’t need, or want to avoid, the complexity of streams and tasks.
  • You do need to materialize the results of a query of multiple base tables.
  • You need to build multiple tables to transform data via an ETL pipeline.
  • You don’t need fine-grained refresh schedule control and you just want to specify the target data freshness for your pipelines.
  • You don’t need to use unsupported dynamic query constructs such as stored procedures, non-deterministic functions not listed in Supported non-deterministic functions in full refresh, or external functions, or need to use sources for dynamic tables that are external tables, streams, or materialized views.
190
Q

What is a hybrid table?

A

A hybrid table is a Snowflake table type that is optimized for hybrid transactional and operational workloads that require low latency and high throughput on small random point reads and writes. You can use a hybrid table along with other Snowflake tables and features to power Unistore workloads that bring transactional and analytical data together in a single platform.

191
Q

Which type of queries benefit most from hybrid tables?

A
  • High concurrency random point reads versus large range reads.
  • High concurrency random writes versus large sequential writes (for example, bulk loading).
  • Retrieval of a small number of entire records (for example, customer object) versus narrow projections with analytical functions (for example, aggregations or GROUP BY queries).
192
Q

Can you clone a hybrid table?

A

FALSE - although cloning is not supported for hybrid tables, you can clone databases and schemas that contain hybrid tables by using the IGNORE HYBRID TABLES parameter in the CREATE <object> … CLONE statement.</object>

193
Q

Are clustering keys supported by hybrid tables?

194
Q

What are Iceberg tables?

A

Apache Iceberg tables for Snowflake combine the performance and query semantics of typical Snowflake tables with external cloud storage that you manage. They are ideal for existing data lakes that you cannot, or choose not to, store in Snowflake.

195
Q

Which features are supported by Iceberg tables?

A
  • ACID (atomicity, consistency, isolation, durability) transactions
  • Schema evolution
  • Hidden partitioning
  • Table snapshots
196
Q

Where does an Iceberg table store its data and metadata?

A

In an external cloud storage location (Amazon S3, Google Cloud Storage, or Azure Storage).

197
Q

What is an external volume?

A

An external volume is a named, account-level Snowflake object that you use to connect Snowflake to your external cloud storage for Iceberg tables. An external volume stores an identity and access management (IAM) entity for your storage location.

198
Q

What types of views are supported?

A
  • Non-materialized
  • Materialized
199
Q

What is a materialized view?

A

A materialized view is a pre-computed data set derived from a query specification (the SELECT in the view definition) and stored for later use. Because the data is pre-computed, querying a materialized view is faster than executing a query against the base table of the view. This performance difference can be significant when a query is run frequently or is sufficiently complex.

200
Q

Do materialized views incur additional costs?

A

TRUE - this allows faster access, but requires storage space and active maintenance, both of which incur additional costs.

201
Q

What is a recursive view?

A

A view that refers to itself. Only works on Non-materialized views.

202
Q

Can a view definition be updated?

A

FALSE - the definition for a view cannot be updated (i.e. you cannot use ALTER VIEW or ALTER MATERIALIZED VIEW to change the definition of a view). To change a view definition, you must recreate the view with the new definition.

203
Q

Are changes to a table automatically propagated to views created on that table?

A

FALSE - for example, if you drop a column in a table, the views on that table might become invalid.

204
Q

Which view types can be defined as secure?

A
  • non-materialized
  • materialized
205
Q

When should you use a secure view?

A

Views should be defined as secure when they are specifically designated for data privacy (i.e. to limit access to sensitive data that should not be exposed to all users of the underlying table(s)).

206
Q

EXAMPLE - Using the following widgets example, consider a user who has access to only the red widgets. Suppose the user wonders if any purple widgets exist and issues the following query:

SELECT *
FROM widgets_view
WHERE 1/iff(color = ‘Purple’, 0, 1) = 1;

If any purple widgets exist, then the IFF() expression returns 0. The division operation then fails due to a division-by-zero error, which allows the user to infer that at least one purple widget exists.

207
Q

Who can view the definition of a secure view?

A

The definition of a secure view is only exposed to authorized users (i.e. users who have been granted the role that owns the view).

208
Q

How can you determine if a non-materialized view is secured?

A

The IS_SECURE column in the Information Schema and Account Usage views identifies whether a view is secure.

209
Q

How can you determine if a materialized view is secured?

A

Use the SHOW MATERIALIZED VIEWS command.

210
Q

NOTE - view security can be integrated with Snowflake users and roles using the CURRENT_ROLE and CURRENT_USER context functions.

211
Q

What alternatives can be used to prevent exposing sensitive information if the surrogate key reveals important details about the underlying data distribution?

A
  • Do not expose the sequence-generated column as part of the view.
  • Use randomized identifiers (e.g. generated by UUID_STRING) instead of sequence-generated values.
  • Programmatically obfuscate the identifiers.
212
Q

What is the recommended approach to prevent the exposure of approximate information about the underlying data to users who only have access to a subset of the data when using secure views?

A

It is best to materialize data per user/role instead of exposing views on the base data to users

213
Q

Which function should be used to authorize users from a specific account to access rows in a base table when using secure views with Secure Data Sharing?

A

CURRENT_ACCOUNT

214
Q

When is a materialized view useful?

A
  • Query results contain a small number of rows and/or columns relative to the base table (the table on which the view is defined).
  • Query results contain results that require significant processing.
  • The query is on an external table, which might have slower performance compared to querying native database tables or Apache Iceberg™ tables.
  • The view’s base table does not change frequently.
215
Q

When should you create a regular view (ANY)?

A
  • The results of the view change often.
  • The results are not used often (relative to the rate at which the results change).
  • The query is not resource intensive so it is not costly to re-run it.
216
Q

When should you create a materialized view (ALL)?

A
  • The query results from the view don’t change often.
  • The results of the view are used often.
  • The query consumes a lot of resources.
217
Q

Materialized views require Enterprise Edition?