Azure DP-201 Flashcards
What are the tiers of Azure Blob Storage?
- Hot: for frequently used data, high storage costs, low read/write cost
- Cool: after 30 days, lower storage costs, higher read/write costs
- Archive: after 180 days
What are the 5 levels of consistency in Cosmos DB?
- Strong
- Bounded Staleness
- Session
- Consistent Prefix
- Eventual
What is the recommended file size for an Azure Data Lake Gen1 that requires POSIX permissions and enables diagnostics logging for auditing?
250 mb or greater
What is horizontal partitioning?
aka Sharding
Data is partitioned horizontally to distribute rows across a scaled out data tier. The schema is identical on all participating databases.
*** Which data storage solution should you recommend, if you need to represent data by using nodes and relationships in graph structures?
Cosmos DB
What are the distribution types
Hash-distributed
Round-robin
Replicate
What is Azure Synapse Analytics?
Formerly SQL Data warehouse
Azure Synapse is an analytics service that brings together enterprise data warehousing and Big Data analytics
In Azure Databricks, how would you keep an interactive cluster configuration even after it has been terminated for more than 30 days?
an administrator can pin a cluster to the cluster list
What are the core storage services in the Azure Storage platform?
- Azure blobs
- Azure Files
- Azure Queues
- Azure Tables
- Azure Disks
Choosing Data Abstraction methods:
https://docs.microsoft.com/en-us/azure/hdinsight/spark/optimize-data-storage#choose-data-abstraction
What is the best data format for Spark jobs?
Parquet
Datasets vs. Dataframes
DataFrames:
Best choice in most situations.
Provides query optimization through Catalyst.
Whole-stage code generation.
Direct memory access.
Low garbage collection (GC) overhead.
Not as developer-friendly as DataSets, as there are no compile-time checks or domain object programming.
DataSets:
Good in complex ETL pipelines where the performance impact is acceptable.
Not good in aggregations where the performance impact can be considerable.
Provides query optimization through Catalyst.
Developer-friendly by providing domain object programming and compile-time checks.
Adds serialization/deserialization overhead.
High GC overhead.
Breaks whole-stage code generation.
What data models does Cosmos DB support?
document, key-value, graph, and column-family data models.
You work for a transportation logistics company. You are incurring large costs in the transformation step of your big data architecture. What is a possible way to reduce this cost?
Use Polybase.
PolyBase allows for ELT instead of ETL
What are two benefits of Databricks?
- It can utilize multiple API’s.
2. it can visualize individual pieces of code
What is Data Masking?
A way to hide sensitive data from users that should not have access.
Examples: Social Security number, credit card number
What are reasons to use Data Masking?
- Protect non-production data
- Protect against insider threats
- Comply with regulatory requirements
What are use cases for SQL Database Auditing?
- Retain Audit Trails (see who has accessed the service)
- Report on event activity (visualize audit trails)
- Analyze (spot trends or unusual activity)
You work for a retail sales chain. Your marketing department needs to access client data to design marketing promotions. Concerns have been raised about access to the data. What is the most appropriate solution to protect the data and allow the marketing department to function?
Data Masking
This would protect sensitive data while still granting the marketing department access.
What is defense in depth?
A layered approach to security. This is a replacement to the Zero Trust Model (all or nothing model)
What is the difference between BLOB Storage and Data Lake Gen2?
Data Lake Gen2 has a hierarchical namespace (the collection of objects and files are organized into directories and sub-directories). Similar to file explorer on your computer.
What are the two options Azure offers for relational cloud data store (RDBMS)?
- SQL Database
2. Azure Synapse (SQL Data Warehouse)
What Azure big data service is best for transaction processing of relational data?
SQL Database
What are advantages of SQL Database?
- Consistent data that can handle complex queries
- for transactional processing
- Single source data capture
- Scales vertically
- for relational data