Data Storage Flashcards
Awesome Company wants to create a folder structure in their landing zone that will simplify permission maintenance as much as possible. Which of the following formats accomplishes that?
A) \YY\MM\DD\DataSource\Landing
B) \Landing\DataSource\YY\MM\DD
C) \Landing
D) \YY\MM\DD
B) \Landing\DataSource\YY\MM\DD
Having the zone at the top level and then the data sources below that allow for permissions to be set the fewest amount of times with the greatest impact.
Which Azure Storage redundancy option creates synchronous copies across 3 availability zones in the primary region?
ZRS
Zone-redundant storage (ZRS) creates 3 synchronous copies across Azure availability zones in the primary region.
What Databricks feature allows you to skip files within a partition based on query filters?
Dynamic File Pruning
Dynamic file pruning utilizes data skipping and Z-Ordering to skip files within a partition based on filters.
Which of the following file types will allow you to store nested data in a columnar format?
A) Parquet
B) Avro
C) JSON
D) ORC
A) Parquet
Parquet uses nested data structures and columnar storage, where values of each column type are stored together.
What is the storage solution for unstructured and semi-structured files that is most often used in Azure analytics solutions?
Azure Data Lake Storage Gen2
Functional partitioning is based on identifying bounded contexts and allows you to use these for improving isolation and performance.
Which Azure Storage access tier is intended for long-term backup and regulatory compliance?
Archive
Archive access tier is for long-term storage of 180 days or more.
Which sharding strategy utilizes a map to direct queries to the appropriate shard?
The Lookup Strategy
This sharding logic uses a map to route requests to the appropriate shard based on the shard key.
What type of compression allows you to reduce the size of rowstore objects by looking for patterns in the data and making replacements with smaller values?
Page compression.
Page compression looks for patterns in the data and makes replacements with smaller values. This is especially useful for duplicate data.
Which partitioning method allows you to improve isolation and data access performance by identifying a bounded context for a distinct business area?
Functional Partitioning
Functional partitioning is based on identifying bounded contexts and allows you to use these for improving isolation and performance.
In Azure Synapse Analytics, how many underlying distributions are automatically assigned?
60
Azure Synapse Analytics automatically creates and manages 60 distributions for your data.