Test Flashcards

1
Q

You are a data engineer implementing a lambda architecture on Microsoft Azure. You use an open-source big data solution to collect, process, and maintain data.
The analytical data store performs poorly.
You must implement a solution that meets the following requirements:

Provide data warehousing
Reduce ongoing management activities
Deliver SQL query responses in less than one second
You need to create an HDInsight cluster to meet the requirements.
Which type of cluster should you create?
A. Interactive Query
B. Apache Hadoop
C. Apache HBase
D. Apache Spark
A

Apache Spark

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

You develop data engineering solutions for a company. The company has on-premises Microsoft SQL Server databases at multiple locations.
The company must integrate data with Microsoft Power BI and Microsoft Azure Logic Apps. The solution must avoid single points of failure during connection and
transfer to the cloud. The solution must also minimize latency.
You need to secure the transfer of data between on-premises databases and Microsoft Azure.
What should you do?
A. Install a standalone on-premises Azure data gateway at each location
B. Install an on-premises data gateway in personal mode at each location
C. Install an Azure on-premises data gateway at the primary location
D. Install an Azure on-premises data gateway as a cluster at each location

A

Install an Azure on-premises data gateway as a cluster at each location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You are a data architect. The data engineering team needs to configure a synchronization of data between an on-premises Microsoft SQL Server database to Azure
SQL Database.
Ad-hoc and reporting queries are being overutilized the on-premises production instance. The synchronization process must:
Perform an initial data synchronization to Azure SQL Database with minimal downtime
Perform bi-directional data synchronization after initial synchronization
You need to implement this synchronization solution.
Which synchronization method should you use?
A. transactional replication
B. Data Migration Assistant (DMA)
C. backup and restore
D. SQL Server Agent job
E. Azure SQL Data Sync

A

Azure SQL Data Sync

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

An application will use Microsoft Azure Cosmos DB as its data solution. The application will use the Cassandra API to support a column-based database type that
uses containers to store items.
You need to provision Azure Cosmos DB. Which container name and item name should you use? Each correct answer presents part of the solutions.
NOTE: Each correct answer selection is worth one point.
A. collection
B. rows
C. graph
D. entities
E. table

A

rows

table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A company has a SaaS solution that uses Azure SQL Database with elastic pools. The solution contains a dedicated database for each customer organization.
Customer organizations have peak usage at different periods during the year.
You need to implement the Azure SQL Database elastic pool to minimize cost.
Which option or options should you configure?
A. Number of transactions only
B. eDTUs per database only
C. Number of databases only
D. CPU usage only
E. eDTUs and max data size

A

eDTUs and max data size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A company manages several on-premises Microsoft SQL Server databases.
You need to migrate the databases to Microsoft Azure by using a backup process of Microsoft SQL Server.
Which data technology should you use?
A. Azure SQL Database single database
B. Azure SQL Data Warehouse
C. Azure Cosmos DB
D. Azure SQL Database Managed Instance

A

Azure SQL Database Managed Instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
The data engineering team manages Azure HDInsight clusters. The team spends a large amount of time creating and destroying clusters daily because most of the
data pipeline process runs in minutes.
You need to implement a solution that deploys multiple HDInsight clusters with minimal effort.
What should you implement?
A. Azure Databricks
B. Azure Traffic Manager
C. Azure Resource Manager templates
D. Ambari web user interface
A

Azure Resource Manager templates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

You are the data engineer for your company. An application uses a NoSQL database to store data. The database uses the key-value and wide-column NoSQL
database type.
Developers need to access data in the database using an API.
You need to determine which API to use for the database model and type.
Which two APIs should you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. Table API
B. MongoDB API
C. Gremlin API
D. SQL API
E. Cassandra API

A

MongoDB API

Cassandra API

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A company is designing a hybrid solution to synchronize data and on-premises Microsoft SQL Server database to Azure SQL Database.
You must perform an assessment of databases to determine whether data will move without compatibility issues. You need to perform the assessment.
Which tool should you use?
A. SQL Server Migration Assistant (SSMA)
B. Microsoft Assessment and Planning Toolkit
C. SQL Vulnerability Assessment (VA)
D. Azure SQL Data Sync
E. Data Migration Assistant (DMA)

A

Data Migration Assistant (DMA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A company plans to use Azure SQL Database to support a mission-critical application.
The application must be highly available without performance degradation during maintenance windows.
You need to implement the solution.
Which three technologies should you implement? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. Premium service tier
B. Virtual machine Scale Sets
C. Basic service tier
D. SQL Data Sync
E. Always On availability groups
F. Zone-redundant configuration

A

Premium service tier
Always On availability groups
Zone-redundant configuration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A company plans to use Azure Storage for file storage purposes. Compliance rules require:
A single storage account to store all operations including reads, writes and deletes
Retention of an on-premises copy of historical operations
You need to configure the storage account.
Which two actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

A. Configure the storage account to log read, write and delete operations for service type Blob
B. Use the AzCopy tool to download log data from $logs/blob
C. Configure the storage account to log read, write and delete operations for service-type table
D. Use the storage client to download log data from $logs/table
E. Configure the storage account to log read, write and delete operations for service type queue

A

Configure the storage account to log read, write and delete operations for service type Blob

Use the AzCopy tool to download log data from $logs/blob

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

You are creating a new notebook in Azure Databricks that will support R as the primary language but will also support Scola and SQL.
Which switch should you use to switch between languages?
A. %
B. \[]
C. \()
D. @

A

%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
You manage a solution that uses Azure HDInsight clusters.
You need to implement a solution to monitor cluster performance and status.
Which technology should you use?
A. Azure HDInsight .NET SDK
B. Azure HDInsight REST API
C. Ambari REST API
D. Azure Log Analytics
E. Ambari Web UI
A

Ambari Web UI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to an Azure Blob storage account.
You need to output the count of tweets during the last five minutes every five minutes. Each tweet must only be counted once.
Which windowing function should you use?
A. a five-minute Session window
B. a five-minute Sliding window
C. a five-minute Tumbling window
D. a five-minute Hopping window that has one-minute hop

A

a five-minute Tumbling window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

You are developing a solution that will stream to Azure Stream Analytics. The solution will have both streaming data and reference data.
Which input type should you use for the reference data?
A. Azure Cosmos DB
B. Azure Event Hubs
C. Azure Blob storage
D. Azure IoT Hub

A

Azure Blob storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

You have an Azure Storage account and a data warehouse in Azure Synapse Analytics in the UK South region.
You need to copy blob data from the storage account to the data warehouse by using Azure Data Factory.
The solution must meet the following requirements:
Ensure that the data remains in the UK South region at all times.
Minimize administrative effort.
Which type of integration runtime should you use?
A. Azure integration runtime
B. Self-hosted integration runtime
C. Azure-SSIS integration runtime

A

Azure integration runtime

17
Q

You must integrate the company’s on-premises Microsoft SQL Server data with Microsoft Azure SQL Database. Data must be transformed incrementally.
You need to implement the data integration solution.
Which tool should you use to configure a pipeline to copy data?
A. Use the Copy Data tool with Blob storage linked service as the source
B. Use Azure PowerShell with SQL Server linked service as a source
C. Use Azure Data Factory UI with Blob storage linked service as a source
D. Use the .NET Data Factory API with Blob storage linked service as the source

A

Use Azure Data Factory UI with Blob storage linked service as a source

18
Q

You have an Azure Stream Analytics query. The query returns a result set that contains 10,000 distinct values for a column named clusterID.
You monitor the Stream Analytics job and discover high latency.
You need to reduce the latency.
Which two actions should you perform? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. Add a pass-through query.
B. Add a temporal analytic function.
C. Scale out the query by using PARTITION BY.
D. Convert the query to a reference query.
E. Increase the number of streaming units.

A

Scale out the query by using PARTITION BY.

Increase the number of streaming units.

19
Q

Each day, company plans to store hundreds of files in Azure Blob Storage and Azure Data Lake Storage. The company uses the parquet format.
You must develop a pipeline that meets the following requirements:
Process data every six hours
Offer interactive data analysis capabilities
Offer the ability to process data using solid-state drive (SSD) caching
Use Directed Acyclic Graph(DAG) processing mechanisms
Provide support for REST API calls to monitor processes
Provide native support for Python
Integrate with Microsoft Power BI
You need to select the appropriate data technology to implement the pipeline.
Which data technology should you implement?

A. Azure SQL Data Warehouse
B. HDInsight Apache Storm cluster
C. Azure Stream Analytics
D. HDInsight Apache Hadoop cluster using MapReduce
E. HDInsight Spark cluster
A

HDInsight Apache Storm cluster

20
Q

You need to develop a pipeline for processing data. The pipeline must meet the following requirements:
Scale up and down resources for cost reduction
Use an in-memory data processing engine to speed up ETL and machine learning operations.
Use streaming capabilities
Provide the ability to code in SQL, Python, Scala, and R
Integrate workspace collaboration with Git
What should you use?

A. HDInsight Spark Cluster
B. Azure Stream Analytics
C. HDInsight Hadoop Cluster
D. Azure SQL Data Warehouse
E. HDInsight Kafka Cluster
F. HDInsight Storm Cluster
A

HDInsight Spark Cluster