D3.3 - DEFINE BEST PRACTICES THAT SHOULD BE CONSIDERED WHEN LOADING DATA Flashcards

1
Q

Fize Size: Optimizing parallel operations

A
  • The number of load operations that run in parallel cannot exceed the number of data files to be loaded.
  • To optimize number of parallel operations, Snowflake recommends producing data files roughly 100-250 MB (or larger) in size compressed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

File Size: Very large files

A
  • Loading very large files (larger than 100 GB) is not recommended. If you must load one, then consider using the ON_ERROR copy option value.
  • Aborting or skipping a file due to a small number of errors could result in delays and wasted credits.
  • In addition, if a data loading operations continues past the limit of 24 hours, it could be aborted without any portion of the file being committed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

File Size: Handling Very Large Files

A
  • Aggregating smaller files to minimize the processing overhead for each file.
  • Split larger files into a greater number of files to distribute the load among the compute resources in an active warehouse.
  • The number of data files that are processed in parallel is determined by the amount of compute resources in a warehouse.
  • We recommend splitting large files by line to avoid records that span chunks.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Varient Data Type

A
  • Has a 16MB size limit on individual row
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

File Formats

A
  • Structured

- Semi-Structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What types are within Structured files?

A
  • Delimited (CSV, TSV, etc.)

- Any valid single byte delimiter is supported; default is commas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What types are within Semi-structured files?

A
  • JSON:
  • Avro: includes automatic detection and processing of compressed files
  • ORC: includes automatic detection and processing of compressed files
  • Parquet: includes automatic detection and processing of compressed files; other than V2
  • XML: supported as a preview feature
How well did you know this?
1
Not at all
2
3
4
5
Perfectly