Data Analytics in Azure Flashcards

1
Q

Batch vs Stream Processing

A

Batch processing, in which multiple data records are collected and stored before being processed together in a single operation.

Stream processing, in which a source of data is constantly monitored and processed in real time as new data events occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Advantages of Batch Processing

A

Large volumes of data can be processed at a convenient time.

It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight, or during off-peak hours.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Disadvantages of Batch Processing

A

The time delay between ingesting the data and getting the results.

Dependencies between data observations as all of a batch job’s input data must be ready before a batch can be processed. Problems with data, errors, and program crashes that occur during batch jobs bring the whole process to a halt.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

General four step architecture for stream processing

A
  1. Event generates data
  2. Data is captured at a streaming source
  3. Data is processed
  4. Results are written to an output (a.k.a sink)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Stream processing sources in Azure

A

Azure Event Hubs
Azure IoT Hub
Azure Data Lake Store Gen 2
Apache Kafka

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Stream processing sinks in Azure

A
Azure Event Hubs
Azure Data Lake Store Gen 2
Azure blob storage
Azure SQL database
Azure Synapse Analytics
Azure Databricks
Microsoft Power BI
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Azure Stream Analytics

A

Azure Stream Analytics is a service for complex event processing and analysis of streaming data.

Stream Analytics is used to:

  • Ingest data from an input
  • Process the data by using a query to select, project and aggregate data values
  • Write the results to an output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Apache Spark on Microsoft Azure

A

Apache Spark is a distributed processing framework for large scale data analytics. You can use Spark on Microsoft Azure in the following services:

Azure Synapse Analytics
Azure Databricks
Azure HDInsight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Delta Lake

A

Delta Lake is an open-source storage layer that adds support for transactional consistency, schema enforcement, and other common data warehousing features to data lake storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Azure Data Explorer

A

is a standalone Azure service for efficiently analyzing data. You can use the service as the output for analyzing large volumes of diverse data from data sources such as websites, applications, IoT devices, and more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Kusto Query Language (KQL)

A

language that is specifically optimized for fast read performance – particularly with telemetry data that includes a timestamp attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly