D3.3 - DEFINE BEST PRACTICES THAT SHOULD BE CONSIDERED WHEN LOADING DATA Flashcards

Question 1

Q

Fize Size: Optimizing parallel operations

Answer

A

The number of load operations that run in parallel cannot exceed the number of data files to be loaded.
To optimize number of parallel operations, Snowflake recommends producing data files roughly 100-250 MB (or larger) in size compressed

Question 2

Q

File Size: Very large files

Answer

A

Loading very large files (larger than 100 GB) is not recommended. If you must load one, then consider using the ON_ERROR copy option value.
Aborting or skipping a file due to a small number of errors could result in delays and wasted credits.
In addition, if a data loading operations continues past the limit of 24 hours, it could be aborted without any portion of the file being committed.

Question 3

Q

File Size: Handling Very Large Files

Answer

A

Aggregating smaller files to minimize the processing overhead for each file.
Split larger files into a greater number of files to distribute the load among the compute resources in an active warehouse.
The number of data files that are processed in parallel is determined by the amount of compute resources in a warehouse.
We recommend splitting large files by line to avoid records that span chunks.

Question 4

Q

Varient Data Type

Answer

A

Question 5

Q

File Formats

Answer

A

- Semi-Structured

Question 6

Q

What types are within Structured files?

Answer

A

- Any valid single byte delimiter is supported; default is commas

Question 7

Q

What types are within Semi-structured files?

Answer

A

JSON:
Avro: includes automatic detection and processing of compressed files
ORC: includes automatic detection and processing of compressed files
Parquet: includes automatic detection and processing of compressed files; other than V2
XML: supported as a preview feature

(7 cards)