D3.1 - OUTLINE DIFFERENT COMMANDS USED TO LOAD DATA AND WHEN THEY SHOULD BE USED Flashcards
1
Q
COPY INTO
A
- Unloads data from a table or query into one or more files in one of the following locations:
- Named internal stage (files can then be downloaded using the GET command)
- Named external stage (Amazons S3, GCP, Azure)
- External location (cloud location)
2
Q
INSERT
A
- Updates a table by inserting one or more rows into the table.
- The values inserted into each column in the table can be explicitly-specified or the results of a query
3
Q
PUT
A
- Uploads data files from a local directory/folder on a client machine to one of the following snowflake stages:
- Named internal stage
- Internal stage for a specified table
- Internal stage for the current user
- Once files are staged, the data in the files can be loaded into a table using the COPY INTO command
- PUT does not support uploading files to external stages. To upload files to external stages, you must use the utilities provided by the cloud service
4
Q
GET
A
- Downloads from snowflake stages to a local directory/folder on a client machine
- It is typically executed after a COPY INTO command has copied data to the snowflake stage
5
Q
VALIDATE
A
- Validates the files loaded in a past execution of the COPY INTO command and returns all the errors encountered during the load, rather just the first error
6
Q
What is SNOWPIPE?
A
- Snowpipe loads data from files as soon as they are available in a stage.
- The data is loaded according to the COPY statement defined in a reference pipe.
- A pipe is a named, first class Snowflake object that contains a COPY statement used by Snowpipe
- The COPY statement identifies the source location for the data files (i.e. stage) and a target table.
- All data types are supported, including semi-structured data types such as JSON and Avro
7
Q
When using Snowpipe what are the different mechanisms for detecting staged files are available?
A
- Automating Snowpipe using cloud messaging
- Calling Snowpipe REST endpoints
8
Q
Snowpipe: Automated Cloud Messaging
A
- Automated data loads leverage event notifications for cloud storage to inform Snowpipe of the arrival of new data files to load.
- Snowpipe copies the files into a queue, from which they are loaded into the target table in a continuous, serverless fashion based on parameters defined in a specified pipe object
9
Q
Snowpipe: Calling Snowpipe REST endpoints
A
- Your client applications calls a public REST endpoint with the name of a pipe object and a list of data filenames.
- IF new data files matching the list are discovered in the stage referenced by the pipe object, they are queued for loading.
- Snowflake-provided compute resources load data from the queue into a Snowflake table based on parameters defined in the pipe
10
Q
Snowpipe: Calling Snowpipe REST endpoints
A
- Your client applications calls a public REST endpoint with the name of a pipe object and a list of data filenames.
- IF new data files matching the list are discovered in the stage referenced by the pipe object, they are queued for loading.
- Snowflake-provided compute resources load data from the queue into a Snowflake table based on parameters defined in the pipe
11
Q
What are the primary differences between Snowpipe and a bulk data load workflow using the COPY command?
A
- Authentication
- Load History
- Transactions
- Compute Resources
- Cost
12
Q
Snowpipe vs Bulk Data Load: Authentication
A
- Bulk Data Load: relies on the security options supported by the client for authenticating and initiating a user session
- Snowpipe: when calling the REST endpoint’s: requires key pair authentication with JSON web token (JWT). JWTs are signed using a public/private key pair with RSA encryption
13
Q
Snowpipe vs Bulk Data Load: Load History
A
- Bulk Data Load: stored in the metadata of the target table for 64 days. Available upon completion of the COPY statement as the statement output
- Snowpipe: stored in the metadata of the pipe for 14 days. Must be requested from Snowflake via a REST endpoint, SQL table function, or ACCOUNT_USAGE view.
14
Q
Snowpipe vs Bulk Data Load: Transactions
A
- Bulk Data Load: loads are always performed in a single transaction. Data is inserted into a table alongside any other SQL statements submitted manually by users.
- Snowpipe: Loads are combined or split into a single or multiple transactions based on the number and size of the rows in each data file. Rows of partially loaded files (based on the ON_ERROR copy option setting) can also be combined or split into one or more transactions
15
Q
Snowpipe vs Bulk Data Load: Compute Resources
A
- Bulk Data Load: requires a user-specified warehouse to execute COPY statements
- Snowpipe: Uses Snowflake-supplied compute resources