2025 Data Ingestion Flashcards
What two options are provided for loading?
Bulk and Continuous
What basic transformations are supported while loading data using the COPY command?
Reordering columns
Omitting columns
Type casting
Data truncation
What are the three type of internal stages?
User Stage
Table Stage
Named Stage
How do you reference a user stage
@~
How are table stages referenced?
@%
How are named stages referenced
@
What are the acceptble file formats when creating a stage?
CSV, JSON, AVRO, ORC, PARQUET, and XML
If the file format is specified in multiple locations, what is the order of precedence?
Copy into table statement
The Stage definition
The table definition
What is the difference between the DESC and LIST commands on a stage
Desc is metadata on stage, list is the list of files staged successfully.
How is Snowpipe costs calculated
Per Second/per-cpu core and then converts this into credits
To load BROTLI compressed files, what should COMPRESSION be set to on the COPY INTO statement?
BROTLI, not AUTO
What are the options for COMPRESSION on a COPY INTO statement
AUTO
GZIP
BZ2
BROTLI
ZSTD
DEFLATE
RAW_DEFLATE
NONE
On the COPY INTO statement, what happens if a file is listed in FILES but does not exist
Snowflake follows the behavior set ON_ERROR, default is ABORT
What is the upper limit for the number of file names you can list on FILES in a COPY INTO statement
1000
What type of files can be set on TYPE on a COPY INTO statement
CSV
JSON
AVRO
ORC
PARQUET
XML
What options are available on VALIDATION_MODE on a COPY INTO statement
RETURN_n_ROWS
RETURN_ERRORS
RETURN_ALL_ERRORS
What is the difference between RETURN_ALL_ERRORS and RETURN_ERRORS on the VALIDATION_MODE in a COPY INTO statement?
They both return all errors across all the files in the COPY command, but ALL also returns files that were partially loaded during an earlier load.
Where can you get history and see issues that occurred during a load
LOAD_HISTORY in the information schema
In a COPY INTO statement, you might see
SKIP_HEADER = <n>
What is n?</n>
n refers to the number of lines to skip at the beginning of the file. The default value is 0.
What are the options for ON_ERROR on a COPY INTO statement
CONTINUE
SKIP_FILE
ABORT_STATEMENT
What is the default for ON_ERROR for COPY INTO
ABORT_STATEMENT
What is the default for ON_ERROR for Snowpipe
SKIP_FILE
What is the slowest of all the ON_ERROR options
SKIP_FILE
The action buffers the entire file whether errors are found or not. When loading large numbers of records from files with no logical delineation, consider CONTINUE instead
With ON_ERROR set to ABORT_STATEMENT how are files explicitly set on FILES treated vs patterns?
Explicitly requested files are treated as requirements, while pattern-matched files are treated as “load what you can find.”