Azure DP-201 Flashcards

1
Q

What are the tiers of Azure Blob Storage?

A
  • Hot: for frequently used data, high storage costs, low read/write cost
  • Cool: after 30 days, lower storage costs, higher read/write costs
  • Archive: after 180 days
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 5 levels of consistency in Cosmos DB?

A
  • Strong
  • Bounded Staleness
  • Session
  • Consistent Prefix
  • Eventual
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the recommended file size for an Azure Data Lake Gen1 that requires POSIX permissions and enables diagnostics logging for auditing?

A

250 mb or greater

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is horizontal partitioning?

A

aka Sharding

Data is partitioned horizontally to distribute rows across a scaled out data tier. The schema is identical on all participating databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

*** Which data storage solution should you recommend, if you need to represent data by using nodes and relationships in graph structures?

A

Cosmos DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the distribution types

A

Hash-distributed
Round-robin
Replicate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Azure Synapse Analytics?

A

Formerly SQL Data warehouse

Azure Synapse is an analytics service that brings together enterprise data warehousing and Big Data analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In Azure Databricks, how would you keep an interactive cluster configuration even after it has been terminated for more than 30 days?

A

an administrator can pin a cluster to the cluster list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the core storage services in the Azure Storage platform?

A
  1. Azure blobs
  2. Azure Files
  3. Azure Queues
  4. Azure Tables
  5. Azure Disks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Choosing Data Abstraction methods:

A

https://docs.microsoft.com/en-us/azure/hdinsight/spark/optimize-data-storage#choose-data-abstraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the best data format for Spark jobs?

A

Parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Datasets vs. Dataframes

A

DataFrames:
Best choice in most situations.
Provides query optimization through Catalyst.
Whole-stage code generation.
Direct memory access.
Low garbage collection (GC) overhead.
Not as developer-friendly as DataSets, as there are no compile-time checks or domain object programming.

DataSets:
Good in complex ETL pipelines where the performance impact is acceptable.
Not good in aggregations where the performance impact can be considerable.
Provides query optimization through Catalyst.
Developer-friendly by providing domain object programming and compile-time checks.
Adds serialization/deserialization overhead.
High GC overhead.
Breaks whole-stage code generation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What data models does Cosmos DB support?

A

document, key-value, graph, and column-family data models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

You work for a transportation logistics company. You are incurring large costs in the transformation step of your big data architecture. What is a possible way to reduce this cost?

A

Use Polybase.

PolyBase allows for ELT instead of ETL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are two benefits of Databricks?

A
  1. It can utilize multiple API’s.

2. it can visualize individual pieces of code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Data Masking?

A

A way to hide sensitive data from users that should not have access.

Examples: Social Security number, credit card number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are reasons to use Data Masking?

A
  • Protect non-production data
  • Protect against insider threats
  • Comply with regulatory requirements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are use cases for SQL Database Auditing?

A
  1. Retain Audit Trails (see who has accessed the service)
  2. Report on event activity (visualize audit trails)
  3. Analyze (spot trends or unusual activity)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

You work for a retail sales chain. Your marketing department needs to access client data to design marketing promotions. Concerns have been raised about access to the data. What is the most appropriate solution to protect the data and allow the marketing department to function?

A

Data Masking

This would protect sensitive data while still granting the marketing department access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is defense in depth?

A

A layered approach to security. This is a replacement to the Zero Trust Model (all or nothing model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the difference between BLOB Storage and Data Lake Gen2?

A

Data Lake Gen2 has a hierarchical namespace (the collection of objects and files are organized into directories and sub-directories). Similar to file explorer on your computer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the two options Azure offers for relational cloud data store (RDBMS)?

A
  1. SQL Database

2. Azure Synapse (SQL Data Warehouse)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What Azure big data service is best for transaction processing of relational data?

A

SQL Database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are advantages of SQL Database?

A
  • Consistent data that can handle complex queries
  • for transactional processing
  • Single source data capture
  • Scales vertically
  • for relational data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are advantages of SQL Data Warehouse (Synapse)?

A
  • Parallel processing
  • multiple relational source data capture
  • handles complex queries
  • scales horizontally
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are benefits of Cosmos DB?

A
  • global replication
  • multi-model
  • for non-relational data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the 5 levels of consistency for Cosmos DB?

A
  • Strong (Best consistency; Most expensive)
  • Bounded Staleness
  • Session
  • Consistent Prefix
  • Eventual (Weak consistency; Least Expensive)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the options for storing non-relational data in Azure?

A
  1. Cosmos DB
  2. Data Lake Gen2
  3. BLOB storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are the two types of partitioning in Cosmos DB?

A

Logical

Physical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Logical Partitions are based on:

A

Partition Keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

*** What are things to consider when developing a partition key?

A
  • should be a property that will exist on every object
  • anticipate top queries
  • avoid fans
  • Keys are immutable. They cannot change
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is Polybase used for?

A

importing/exporting data between Azure BLOB storage and Synapse (Data Warehouse)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

T/F Data Factory can ingest both structured and unstructured data.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is Data Factory?

A
  • An orchestration service.
  • Primary method for ingesting data into an Azure architecture.
  • Responsible for moving and monitoring the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

T/F Data Factory can be used for both ETL and ELT

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

*** What is a cluster in Databricks?

A

a group of compute resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are the languages available in Databricks?

A

R, SQL, Python, Scala, Java

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

T/F Databricks can be used for streaming and bratch processing.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is Databricks used for?

A

Exploration and visualization of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are components of Databricks?

A

Cluster: compute resources
Workspace: “filing cabinet” for Databricks work
Notebooks: “folders” that contain cells
Cells: individual pieces of code
Libraries: packages that provide additional functionality
Tables: where structured data is stored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are ways to recover from failed queries when streaming in Databricks?

A
  • enable creating checkpoints
  • configure jobs to restart on failure
  • recover after changes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How do you optimize Databricks jobs using scheduler pools?

A

Group jobs into pools by weight.

By default, all queries run in a fair scheduler pool (first in first out). By grouping into pools by weight, you can allow more important jobs to go through first.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

How do you optimize Databricks jobs using configuration settings?

A

use “compute-optimized” instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What are Watermark Policies in Databricks?

A

A way to set thresholds for late data coming in from input streams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are methods to optimize streaming in Databricks

A
  • enable autoscaling
  • optimize configuration settings
  • group jobs into pools by weights
  • recover from query failures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

*** How are you charged for Cosmos DB?

A

Storage

Throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Which are appropriate questions for determining what solution should be used for ingesting and moving data?

A
  • How cost sensitive is the project?

- What is the end result of the data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What are the main concepts in Azure Data Factory?

A
  • Activity
  • Linked Service
  • Pipeline
  • Datasets
  • Pipeline execution and triggers
  • Integration runtime
49
Q

What is RBAC?

A

Role Based Access Control

A system that provides fine-grained access management of Azure resources. Using Azure RBAC, you can segregate duties within your team and grant only the amount of access to users that they need to perform their jobs.

50
Q

Name two resources associated with RBAC?

A

Scope - the set of resources that the access applies to.

Role Definition - a collection of permissions

51
Q

What is a service endpoint?

A

Allow you to secure your critical Azure service resources to only your virtual networks.

52
Q

*** What does Microsoft recommend as the redline for SU%?

A

80%

53
Q

What are the types of activities in Data Factory?

A
  • Data Transformation
  • Data Movement
  • Control Flow
54
Q

What is TDE?

A

Transparent Data Encryption

Protects Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics against threats encrypting data at rest

55
Q

What are inputs for Azure Stream Analytics?

A
  • BLOB Storage
  • IOT Hub
  • Event Hubs
56
Q

What is RPO?

A

Recovery Point Objective

Maximum amount of data that can be lost when restoring backups

57
Q

What is RTO?

A

Recovery Time Objective?

Maximum time that can elapse before a system is brought online

58
Q

What is CIA?

A

Confidentiality, Integrity, Availability

59
Q

What are the three types of SAS?

A

Shared Access Signature

  1. User Delegation
  2. Account
  3. Service
60
Q

What are the API’s available in Cosmos DB?

A
  1. Gremlin
  2. MongoDB
  3. Cassandra
  4. Core
  5. Azure Table Storage
61
Q

*** How are you charged using Data Factory?

A
  • Pipeline Orchestration

- Inactive Pipelines

62
Q

We need to move some data outside of Azure to an On-Prem environment, which solution would be the most appropriate?

A

ExpressRoute

63
Q

What are Event Hubs?

A

fully managed, real-time data ingestion service

64
Q

*** What are two ways BLOB storage can be accessed?

A

HTTP

HTTPS

65
Q

*** What data storage type would be used for SMB network share?

A

Files

66
Q

What is Azure Queue storage service used for?

A

used to store and retrieve messages

67
Q

What is Azure Table storage?

A

Service that stores structured NoSQL data. Used to store large amounts of structured data. Azure tables are ideal for storing structured, non-relational data.

68
Q

What is an Azure Application Gateway?

A

Azure Application Gateway is a web traffic load balancer that enables you to manage traffic to your web applications.

69
Q

What is a VPN gateway?

A

a specific type of virtual network gateway that is used to send encrypted traffic between an Azure virtual network and an on-premises location over the public Internet.

70
Q

*** You are designing an application that will have an Azure virtual machine. The virtual machine will access an Azure SQL database. The database will not be accessible from the Internet.
You need to recommend a solution to provide the required level of access to the database.
What should you include in the recommendation?

A

Add a virtual network to the (*Firewall?) Azure SQL server that hosts the database.

71
Q

*** You are designing an application that will store petabytes of medical imaging data
When the data is first created, the data will be accessed frequently during the first week. After one month, the data must be accessible within 30 seconds, but files will be accessed infrequently. After one year, the data will be accessed infrequently but must be accessible within five minutes.
You need to select a storage strategy for the data. The solution must minimize costs.
Which storage tier should you use for each time frame?

A

First week: hot
After 1 months: cool
After 1 year: cool

Archive: Optimized for storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements (on the order of hours).

72
Q

*** You are designing a new application that uses Azure Cosmos DB. The application will support a variety of data patterns including log records and social media mentions. Which Cosmos DB API should you use for each data pattern?

A

Log records: SQL

Social media mentions: Gremlin

73
Q

*** What is a way to increase throughput while migrating data to Cosmos DB?

A

Increase the RU’s (Request Units)

74
Q

What are the indexing modes in Cosmos DB?

A

Consistent (default mode)

None

75
Q

What actions can you take to speed up data migration to Cosmos DB?

A
  1. Increase RU’s (request units)

2. Turn off indexing

76
Q

ACL vs. RBAC

A

ACL is better suited for implementing security at the individual user level and for low-level data, while RBAC better serves a company-wide security system with an overseeing administrator. An ACL can, for example, grant write access to a specific file, but it cannot determine how a user might change the file.

77
Q

Your company is currently having issues with Databricks. Several times a day data arrives late from various sources. This is causing an issue with the accuracy of the data. You have been asked to provide a solution. What is an appropriate solution to not allow late data into the system?

A

Create a Watermark Policy

78
Q

IoT Hub vs. Event Hub

A

IoT Hub was developed to address the unique requirements of connecting IoT devices to the Azure cloud. Bi-directional communication capabilities mean that while you receive data from devices you can also send commands and policies back to devices.

Event Hubs was designed for big data streaming.

79
Q

Standard vs. Premium Storage accounts

A

Standard:

  • lowest cost per GB
  • best for apps that require bulk storage and data is accessed infrequently

Premium:

  • consistent, low-latency performance
  • best for I/O intensive apps (like databases)
80
Q

AzCopy

A

a command-line utility that you can use to copy blobs or files to or from a storage account

81
Q

What is HDInsight?

A

A managed, full-spectrum, open-source analytics service in the cloud for enterprises. A cloud distribution of Hadoop components.

82
Q

You are designing a data storage solution for a database that is expected to grow to 50 TB. The usage pattern is singleton inserts, singleton updates, and reporting.
Which storage solution should you use?

A

Azure SQL Database Hyperscale

A Hyperscale database supports up to 100 TB of data and provides high throughput and performance

83
Q

A company stores large datasets in Azure, including sales transactions and customer account information.
You must design a solution to analyze the data. You plan to create the following HDInsight clusters:

Sales: cluster must be optimized for ad hoc HIVE queries

Account: cluster must be optimized for HIVE queries that are used in batch processes

A

Sales: choose Interactive Query cluster type to optimize for ad hoc, interactive queries.

Account: Apache Hadoop cluster type to optimize for Hive queries used as a batch process.

84
Q

You are designing an Azure SQL Data Warehouse. You plan to load millions of rows of data into the data warehouse each day.

You must ensure that staging tables are optimized for data loading.

You need to design the staging tables.
What type of tables should you recommend?

A

Round-robin distributed table

85
Q

A company has an application that uses Azure SQL Database as the data store.

The application experiences a large increase in activity during the last month of each year.

You need to manually scale the Azure SQL Database instance to account for the increase in data write operations.

Which scaling method should you recommend?

A

Scale up by increasing the database throughput units.

The cost of running an Azure SQL database instance is based on the number of Database Throughput Units (DTUs) allocated for the database.

(Elastic pools is used if there are two or more databases.)

86
Q

A company installs IoT devices to monitor its fleet of delivery vehicles. Data from devices is collected from Azure Event Hub.

The data must be transmitted to Power BI for real-time data visualizations.

You need to recommend a solution.
What should you recommend?

A

Step 1: Get your IoT hub ready for data access by adding a consumer group.

Step 2: Create, configure, and run a Stream Analytics job for data transfer from your IoT hub to your Power BI account.

Step 3: Create and publish a Power BI report to visualize the data.

87
Q

You have a Windows-based solution that analyzes scientific data. You are designing a cloud-based solution that performs real-time analysis of the data.
You need to design the logical flow for the solution.

Which two actions should you recommend?

A. Send data from the application to an Azure Stream Analytics job.

B. Use an Azure Stream Analytics job on an edge device. Ingress data from an Azure Data Factory instance and build queries that output to Power BI.

C. Use an Azure Stream Analytics job in the cloud. Ingress data from the Azure Event Hub instance and build queries that output to Power BI.

D. Use an Azure Stream Analytics job in the cloud. Ingress data from an Azure Event Hub instance and build queries that output to Azure Data Lake Storage.

E. Send data from the application to Azure Data Lake Storage.

F. Send data from the application to an Azure Event Hub instance.

A

C. Use an Azure Stream Analytics job in the cloud. Ingress data from the Azure Event Hub instance and build queries that output to Power BI.

F. Send data from the application to an Azure Event Hub instance.

88
Q

What are the three pricing plans for Azure functions?

A

Consumption plan: Azure provides all of the necessary computational resources. You don’t have to worry about resource management, and only pay for the time that your code runs.

Premium plan: You specify a number of pre-warmed instances that are always online and ready to immediately respond. When your function runs, Azure provides any additional computational resources that are needed. You pay for the pre-warmed instances running continuously and any additional instances you use as Azure scales your app in and out.

App Service plan: Run your functions just like your web apps. If you use App Service for your other applications, your functions can run on the same plan at no additional cost.

89
Q

You design data engineering solutions for a company.
A project requires analytics and visualization of large set of data. The project has the following requirements:
✑ Notebook scheduling
✑ Cluster automation
✑ Power BI Visualization
You need to recommend the appropriate Azure service.

What Azure service should you recommend?

A

Azure Databricks

A databrick job is a way of running a notebook or JAR either immediately or on a scheduled basis.
Azure Databricks has two types of clusters: interactive and job. Interactive clusters are used to analyze data collaboratively with interactive notebooks. Job clusters are used to run fast and robust automated workloads using the UI or API.
You can visualize Data with Azure Databricks and Power BI Desktop.

90
Q

A company purchases IoT devices to monitor manufacturing machinery. The company uses an IoT appliance to communicate with the IoT devices.
The company must be able to monitor the devices in real-time.

What should you recommend?

A

Azure Stream Analytics cloud job using Azure PowerShell

91
Q

You need to design a telemetry data solution that supports the analysis of log files in real time.

Which two Azure services should you include in the solution?

A

Azure Event Hubs

Azure Databricks

92
Q

You plan to use Azure SQL Database to support a line of business app.
You need to identify sensitive data that is stored in the database and monitor access to the data.
Which three actions should you recommend?

A

Enable Auditing.
Run Vulnerability Assessment.
Use Advanced Threat Protection

93
Q

You have an Azure SQL database that has columns. The columns contain sensitive Personally Identifiable Information (PII) data.
You need to design a solution that tracks and stores all the queries executed against the PII data. You must be able to review the data in Azure Monitor, and the data must be available for at least 45 days.

A
  • Add classifications to the columns that contain sensitive data
  • Turn on Auditing and set the audit log destination to use Azure Blob storage
94
Q

What is deterministic encryption?

A

Deterministic encryption always generates the same encrypted value for any given plain text value. Using deterministic encryption allows point lookups, equality joins, grouping and indexing on encrypted columns.

95
Q

What is randomized encryption?

A

Randomized encryption uses a method that encrypts data in a less predictable manner. Randomized encryption is more secure, but prevents searching, grouping, indexing, and joining on encrypted columns.

96
Q

What is Always Encrypted?

A

A feature in Azure SQL Database or SQL Server databases designed to protect sensitive data.

Provides a separation between those who own the data and can view it, and those who manage the data but should have no access.

97
Q

You are designing the security for an Azure SQL database.
You have an Azure Active Directory (Azure AD) group named Group1.
You need to recommend a solution to provide Group1 with read access to the database only.

A

a contained database user

98
Q

What is the best way to authenticate and access Databricks REST API’s?

A

use personal access tokens

99
Q

What is the authentication method for Azure Data Lake Storage?

A

Azure Active Directory

100
Q

You store data in a data warehouse in Azure Synapse Analytics.
You need to design a solution to ensure that the data warehouse and the most current data is available within one hour of a datacenter failure.
Which three actions should you include in the design?

A
  • Each day, restore the data warehouse from a user-defined restore point to an available Azure region.
  • Each day, create Azure Firewall rules that allow access to the restored data warehouse.
  • If a failure occurs, update the connection strings to point to the recovered data warehouse.
101
Q

You are planning a big data solution in Azure.
You need to recommend a technology that meets the following requirements:
✑ Be optimized for batch processing.
✑ Support autoscaling.
✑ Support per-cluster scaling.
Which technology should you recommend?

A

Databricks

102
Q

What is SAS (shared access signature)?

A

a URI that grants restricted access rights to Azure Storage resources.

By distributing a shared access signature URI, you can grant clients access to a resource for a specified period of time, with a specified set of permissions.

103
Q

You need to recommend a security solution to grant anonymous users permission to access the blobs in a specific container only.
What should you include in the recommendation?

A

the public access level for the blobs service

104
Q

From Databricks, you need to access Data Lake Storage directly by using a service principal.
What should you include in the solution?

A

an application registration in Azure Active Directory (Azure AD)

105
Q

You are designing security for administrative access to Azure SQL Data Warehouse.
You need to recommend a solution to ensure that administrators use two-factor authentication when accessing the data warehouse from Microsoft SQL Server
Management Studio (SSMS).
What should you include in the recommendation?

A

Azure conditional access policies

106
Q

You are developing a solution that performs real-time analysis of IoT data in the cloud. The solution must remain available during Azure service updates.

What two action should you recommend?

A
  • Deploy an Azure Stream Analytics job to each region in a paired region.
  • Monitor jobs in both regions for failure.
107
Q

A company is developing a mission-critical line of business app that uses Azure SQL Database Managed Instance.
You must design a disaster recovery strategy for the solution/
You need to ensure that the database automatically recovers when full or partial loss of the Azure SQL Database service occurs in the primary region.
What should you recommend?

A

Failover group

failover groups is a SQL Database feature that allows you to manage replication and failover of a group of databases on a SQL Database server or all databases in a Managed Instance to another region

108
Q

A company is evaluating data storage solutions.
You need to recommend a data storage solution that meets the following requirements:
✑ Minimize costs for storing blob objects.
✑ Optimize access for data that is infrequently accessed.
✑ Data must be stored for at least 30 days.
✑ Data availability must be at least 99 percent.
What should you recommend?

A

Cold Storage

109
Q

What are SQL Database Elastic Pools?

A

a simple, cost-effective solution for managing and scaling multiple databases that have varying and unpredictable usage demands. The databases in an elastic pool are on a single Azure SQL Database server and share a set number of resources at a set price.

110
Q

You are designing an Azure Databricks cluster that runs user-defined local processes.
You need to recommend a cluster configuration that meets the following requirements:
✑ Minimize query latency
✑ Reduce overall costs
✑ Maximize the number of users that can run queries on the cluster at the same time.
Which cluster type should you recommend?

A

High Concurrency with Autoscaling

111
Q

How do you allow multiple users to run queries on a Databricks cluster at the same time?

A

with a High Concurrency cluster

112
Q

You are designing a solution for a company. The solution will use model training for objective classification.
You need to design the solution

What should you recommend?

A

a Spark application that uses Spark MLib.

113
Q

You plan to store delimited text files in an Azure Data Lake Storage account that will be organized into department folders.
You need to configure data access so that users see only the files in their respective department folder.

A

From the storage account, you disable a hierarchical namespace, and you use access control lists (ACLs).

114
Q

You are planning a design pattern based on the Kappa architecture.

What service do you use for the speed layer and the serving layer?

A

Speed layer: Data Factory

Serving layer: Databricks

115
Q

You have an Azure SQL database that has columns. The columns contain sensitive Personally Identifiable Information (PII) data.
You need to design a solution that tracks and stores all the queries executed against the PII data. You must be able to review the data in Azure Monitor, and the data must be available for at least 45 days.

A

add classifications to the columns that contain sensitive data. You turn on Auditing and set the audit log destination to use Azure Blob storage.

116
Q

A company is developing a solution to manage inventory data for a group of automotive repair shops. The solution will use Azure SQL Data Warehouse as the data store.

Shops will upload data every 10 days.
Data corruption checks must run each time data is uploaded. If corruption is detected, the corrupted data must be removed.

You need to ensure that upload processes and data corruption checks do not impact reporting and analytics processes that use the data warehouse.

A

Create a user-defined restore point before data is uploaded. Delete the restore point after data corruption checks complete.

117
Q

You have an Azure Data Lake Storage account that contains a staging zone.
You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.

A

You use an Azure Data Factory schedule trigger to execute a pipeline that copies the data to a staging table in the data warehouse, and then uses a stored procedure to execute the R script.

118
Q

You are designing a solution for the ad hoc analysis of data in Azure Databricks notebooks. The data will be stored in Azure Blob storage.
You need to ensure that Blob storage will support the recovery of the data if the data is overwritten accidentally.
What should you recommend?

A

Enable soft delete.

119
Q

You have streaming data that is received by Azure Event Hubs and stored in Azure Blob storage. The data contains social media posts that relate to a keyword of
Contoso.

You need to count how many times the Contoso keyword and a keyword of Litware appear in the same post every 30 seconds. The data must be available to
Microsoft Power BI in near real-time.

A

You create an Azure Stream Analytics job that uses an input from Event Hubs to count the posts that have the specified keywords, then and send the data to an Azure SQL database. You consume the data in Power BI by using DirectQuery mode.