Describe consideration for real-time data analytics Flashcards

1
Q

Batch Processing

A

Batch processing, in which multiple data records are collected and stored before being processed together in a single operation.

Advantages of batch processing include:

-Large volumes of data can be processed at a convenient time.
-It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight, or during off-peak hours.

Disadvantages of batch processing include:

The time delay between ingesting the data and getting the results.
All of a batch job’s input data must be ready before a batch can be processed. This means data must be carefully checked. Problems with data, errors, and program crashes that occur during batch jobs bring the whole process to a halt.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Stream Processing

A

Stream processing, in which a source of data is constantly monitored and processed in real time as new data events occur.

In stream processing, each new piece of data is processed when it arrives. Unlike batch processing, there’s no waiting until the next batch processing interval - data is processed as individual units in real-time rather than being processed a batch at a time. Stream data processing is beneficial in scenarios where new, dynamic data is generated on a continual basis.

-Stream processing is ideal for time-critical operations that require an instant real-time response.

Real world examples of streaming data include:

-A financial institution tracks changes in the stock market in real time, computes value-at-risk, and automatically rebalances portfolios based on stock price movements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Differences between batch and streaming data

A

-Data scope: Batch processing can process all the data in the dataset. Stream processing typically only has access to the most recent data received, or within a rolling time window (the last 30 seconds, for example).

-Data size: Batch processing is suitable for handling large datasets efficiently. Stream processing is intended for individual records or micro batches consisting of few records.

-Performance: Latency is the time taken for the data to be received and processed. The latency for batch processing is typically a few hours. Stream processing typically occurs immediately, with latency in the order of seconds or milliseconds.

-Analysis: You typically use batch processing to perform complex analytics. Stream processing is used for simple response functions, aggregates, or calculations such as rolling averages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Combine batch and stream processing

A

Many large-scale analytics solutions include a mix of batch and stream processing, enabling both historical and real-time data analysis. It’s common for stream processing solutions to capture real-time data, process it by filtering or aggregating it, and present it through real-time dashboards and visualizations (for example, showing the running total of cars that have passed along a road within the current hour), while also persisting the processed results in a data store for historical analysis alongside batch processed data (for example, to enable analysis of traffic volumes over the past year).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Azure Stream Analytics

A

Azure Stream Analytics is a real-time stream processing engine that captures a stream of data from an input, applies a query to extract and manipulate data from the input stream, and writes the results to an output for analysis or further processing.

Data engineers can incorporate Azure Stream Analytics into data analytics architectures that capture streaming data for ingestion into an analytical data store or for real-time visualization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Azure Synapse Data Explorer

A

Azure Data Explorer is a standalone service that offers the same high-performance querying of log and telemetry data as the Azure Synapse Data Explorer runtime in Azure Synapse Analytics.

Data analysts can use Azure Data Explorer to query and analyze data that includes a timestamp attribute, such as is typically found in log files and Internet-of-things (IoT) telemetry data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Spark Structured Streaming

A

Spark Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Apache Spark SQL engine. It extends the capabilities of Spark SQL to support streaming data processing. Structured Streaming provides a high-level, declarative API for building real-time applications and analytics on streaming data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly