Data Analytics in Azure Flashcards

Question 1

Q

Batch vs Stream Processing

Answer

A

Batch processing, in which multiple data records are collected and stored before being processed together in a single operation.

Stream processing, in which a source of data is constantly monitored and processed in real time as new data events occur.

Question 2

Q

Advantages of Batch Processing

Answer

A

Large volumes of data can be processed at a convenient time.

It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight, or during off-peak hours.

Question 3

Q

Disadvantages of Batch Processing

Answer

A

The time delay between ingesting the data and getting the results.

Dependencies between data observations as all of a batch job’s input data must be ready before a batch can be processed. Problems with data, errors, and program crashes that occur during batch jobs bring the whole process to a halt.

Question 4

Q

General four step architecture for stream processing

Answer

A

Event generates data
Data is captured at a streaming source
Data is processed
Results are written to an output (a.k.a sink)

Question 5

Q

Stream processing sources in Azure

Answer

A

Azure Event Hubs
Azure IoT Hub
Azure Data Lake Store Gen 2
Apache Kafka

Question 6

Q

Stream processing sinks in Azure

Answer

A

Azure Event Hubs
Azure Data Lake Store Gen 2
Azure blob storage
Azure SQL database
Azure Synapse Analytics
Azure Databricks
Microsoft Power BI

Question 7

Q

Azure Stream Analytics

Answer

A

Azure Stream Analytics is a service for complex event processing and analysis of streaming data.

Stream Analytics is used to:

Ingest data from an input
Process the data by using a query to select, project and aggregate data values
Write the results to an output

Question 8

Q

Apache Spark on Microsoft Azure

Answer

A

Apache Spark is a distributed processing framework for large scale data analytics. You can use Spark on Microsoft Azure in the following services:

Azure Synapse Analytics
Azure Databricks
Azure HDInsight

Question 9

Q

Delta Lake

Answer

A

Delta Lake is an open-source storage layer that adds support for transactional consistency, schema enforcement, and other common data warehousing features to data lake storage.

Question 10

Q

Azure Data Explorer

Answer

A

is a standalone Azure service for efficiently analyzing data. You can use the service as the output for analyzing large volumes of diverse data from data sources such as websites, applications, IoT devices, and more.

Question 11

Q

Kusto Query Language (KQL)

Answer

A

language that is specifically optimized for fast read performance – particularly with telemetry data that includes a timestamp attribute.

Data Analytics in Azure Flashcards

(11 cards)