2025 Data Ingestion Flashcards

1
Q

What two options are provided for loading?

A

Bulk and Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What basic transformations are supported while loading data using the COPY command?

A

Reordering columns
Omitting columns
Type casting
Data truncation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three type of internal stages?

A

User Stage
Table Stage
Named Stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you reference a user stage

A

@~

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How are table stages referenced?

A

@%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How are named stages referenced

A

@

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the acceptble file formats when creating a stage?

A

CSV, JSON, AVRO, ORC, PARQUET, and XML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If the file format is specified in multiple locations, what is the order of precedence?

A

Copy into table statement
The Stage definition
The table definition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between the DESC and LIST commands on a stage

A

Desc is metadata on stage, list is the list of files staged successfully.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is Snowpipe costs calculated

A

Per Second/per-cpu core and then converts this into credits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

To load BROTLI compressed files, what should COMPRESSION be set to on the COPY INTO statement?

A

BROTLI, not AUTO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the options for COMPRESSION on a COPY INTO statement

A

AUTO
GZIP
BZ2
BROTLI
ZSTD
DEFLATE
RAW_DEFLATE
NONE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

On the COPY INTO statement, what happens if a file is listed in FILES but does not exist

A

Snowflake follows the behavior set ON_ERROR, default is ABORT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the upper limit for the number of file names you can list on FILES in a COPY INTO statement

A

1000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What type of files can be set on TYPE on a COPY INTO statement

A

CSV
JSON
AVRO
ORC
PARQUET
XML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What options are available on VALIDATION_MODE on a COPY INTO statement

A

RETURN_n_ROWS
RETURN_ERRORS
RETURN_ALL_ERRORS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the difference between RETURN_ALL_ERRORS and RETURN_ERRORS on the VALIDATION_MODE in a COPY INTO statement?

A

They both return all errors across all the files in the COPY command, but ALL also returns files that were partially loaded during an earlier load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Where can you get history and see issues that occurred during a load

A

LOAD_HISTORY in the information schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In a COPY INTO statement, you might see
SKIP_HEADER = <n>
What is n?</n>

A

n refers to the number of lines to skip at the beginning of the file. The default value is 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the options for ON_ERROR on a COPY INTO statement

A

CONTINUE
SKIP_FILE
ABORT_STATEMENT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the default for ON_ERROR for COPY INTO

A

ABORT_STATEMENT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the default for ON_ERROR for Snowpipe

A

SKIP_FILE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the slowest of all the ON_ERROR options

A

SKIP_FILE
The action buffers the entire file whether errors are found or not. When loading large numbers of records from files with no logical delineation, consider CONTINUE instead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

With ON_ERROR set to ABORT_STATEMENT how are files explicitly set on FILES treated vs patterns?

A

Explicitly requested files are treated as requirements, while pattern-matched files are treated as “load what you can find.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
If ON_ERROR is set to ABORT_STATEMENT, where do you troubleshoot issues
QUERY_HISTORY Since the data was not ingested, it will not be found in COPY_HISTORY
26
On a COPY INTO statement, what is num on SIZE_LIMIT =
Max size in bytes of data to be loaded, after which the COPY stops loading files. At least one file is loaded regardless of the value specified.
27
On a COPY INTO statement, how do you specify to get rid of a file from the stage once the load is successful
PURGE = TRUE
28
If a purge operation fails, is an error returned
No
29
How do you force a COPY INTO statement to load a file that has already been loaded?
FORCE = TRUE
30
What is the time window a duplicate file will not be loaded by default
64 days
31
When tracking a file's loaded state, what is LAST_MODIFIED
The timestamp of when the file was initially staged or when it was last modified, whichever is later
32
On a COPY INTO statement, what happens if text exceed the column length and TRUNCATECOLUMNS = FALSE
It will produce an error
33
What is the difference between TRUNCATECOLUMNS and ENFORCE_LENGTH on a COPY INTO statement
Truncate will truncate if set to true Enforce will truncate if set to false
34
When using Snowpipe, how do you get rid of a file after loading?
PURGE is not an option. To remove files, periodically execute REMOVE command
35
What two options are available to detect a file is available for Snowpipe
Cloud Messaging REST API
36
If a pipe is paused, how long are event notifications maintained
14 days. Older than that are processed on a best effort basis
37
The REST API for Snowpipe requires what for authentiction
JWT signed using RSA keys
38
How are files loaded in Snowpipe
FIFO is attempted but not guaranteed
39
What stages are supported for Snowpipe
Internal, External, and Table
40
Once data is staged, what endpoint should be used for data loading
insertFiles
41
For the REST endpoint insertFiles, what is the max number of files?
5000
42
For the REST endpoint insertFiles, what is the max file path for each file
1,024 bytes
43
For the REST endpoint insertFiles, what does a response 200 mean?
Success, files are added to the queue.
44
For the REST endpoint insertFiles, what does a response 400 mean?
Invalid format/limit exceeded
45
For the REST endpoint insertFiles, what does a response 404 mean?
Pipe name not recognized
46
For the REST endpoint insertFiles, what does a response 429 mean?
Request rate limit exceeded failure
47
For the REST endpoint insertFiles, what does a response 500 mean?
Internal error failure
48
What doe the REST endpoint insertReport do?
Returns a report of files recently ingested into Snowlake. Only shows files for a max of the last 10 minutes, and the 10,000 most recent
49
For the REST endpoint insertReport, what does a response 200 mean?
Success, list of files returned
50
For the REST endpoint insertReport, what does a response 400 mean?
Invalid file format/limit exceeded
51
For the REST endpoint insertReport, what does a response 404 mean?
Pipe name not recognized
52
For the REST endpoint insertReport, what does a response 429 mean?
Request rate limit exceeded failure
53
For the REST endpoint insertReport, what does a response 500 mean?
Internal error occured
54
What does the REST endpoint loadHistoryScan do?
Returns a report of files recently ingested into table and the view history between two points in time. Returns a max of 10,000 items, but multiple calls can be made.
55
For the REST endpoint loadHistoryScan, what does a response 200 mean?
Success, returns load history scan results
56
For Snowpipe all COPY INTO commands are support except for
FILES = (''...) ON_ERROR = ABORT_STATEMENT SIZE_LIMIT PURGE FORCE RETURN_FAILED_ONLY VALIDATION_MODE
57
In an Internal Stage, what encryption is used
AWS 256
58
When calling the Snowpipe API what is the difference between loadHistoryScan and insertReport
loadHistoryScan fetches a report about ingested files between two points in time
59
If you are trying to avoid the rate limit error code, 429, which should you use insertReport or loadHistoryScan
insertReport, calling every 8 minutes to get the last 10 minutes of history
60
For insertFiles, how long are events retained for
A maximum of 10 minutes
61
Schema on Read is good for what type of data
Semi-structure data like JSON
62
Schema on Write is good for what type of data
Structured and relational data
63
If an AWS Administrator accidentally deletes a SQS subscription to the SNS topic for a snowpipe, what will happen to the pipe which references the topic to receive event messages from Amazon S3?
The pipe will no longer be able to receive the messages. To restore the system immediately, the user needs to manually create a new SNS topic with a different name and then recreate the pipe by specifying the new SNS topic name in the pipe definition. Without the rename, you have to wait 72 hours and still recreate the pipe.
64
What happens in the COPY INTO statement if data is transformed and the validation_mode is set to return_all_errors
SQL compilation error: invalid parameter 'validation_mode' The VALIDATION_MODE parameter does not support COPY statements that transform data during a load.
65
When is the parameter INTEGRATION from Snowpipe required?
When configuring AUTO_INGEST for Google Cloud Storage or Microsoft Azure stages.
66
Does Snowpipe have the USAGE privilege
No