Chapter 6, Stream-Processing Patterns gpt Flashcards
What do event-driven architecture patterns focus on compared to stream-processing patterns?
Event-driven architecture patterns revolve around event delivery and orchestration, whereas stream-processing patterns focus on how such events can be processed on the fly to extract meaningful information and take actions in real time.
Page 330
Event-driven architecture patterns revolve around event delivery and orchestration, whereas stream-processing patterns focus on how such events can be processed on the fly to extract meaningful information and take actions in real time.
330
How can messages be transformed in a system?
Messages can be transformed by using various techniques, such as via code with traditional programming languages and through specialized applications that perform data mapping. These applications include service buses and stream-processing systems that can run on the cloud, such as Apache Camel, KSQL, Amazon Kinesis, and Azure Stream Analytics.
Page 333
Messages can be transformed by using various techniques, such as via code with traditional programming languages and through specialized applications that perform data mapping. These applications include service buses and stream-processing systems that can run on the cloud, such as Apache Camel, KSQL, Amazon Kinesis, and Azure Stream Analytics.
333
What is protocol switching and when is it necessary?
Protocol switching is needed when different teams use different, noncompatible message brokers. For example, one team might use Kafka for its message processing, while another uses Apache ActiveMQ. An intermediate application is used to consume events from AMQP, deserialize them, serialize them as Kafka events, and publish them to Kafka.
Page 335
Protocol switching: When working with partners and third-party teams, sometimes different teams will use different, noncompatible message brokers. One team might use Kafka for its message processing, while another uses Apache ActiveMQ, for instance. We cannot simply send events from one to another without some kind of conversion. Here, we use an intermediate application that consumes events from AMQP and deserializes them. Then it serializes those events as Kafka events and publishes them to Kafka.
335
Can protocol switching be implemented without data mapping?
Yes, protocol switching alone does not require data mapping, so it can be implemented via a simple cloud native application by using the appropriate protocol libraries for both event consumption and publishing.
Page 335
Protocol switching alone does not require data mapping, so it can be implemented via a simple cloud native application by using the appropriate protocol libraries for both event consumption and publishing.
335
When is the Transformation Pattern especially useful?
The Transformation Pattern is especially useful when working with applications managed by partner teams, and transformations are needed to allow cloud native applications to interoperate.
Page 335
Considerations: the Transformation Pattern is especially useful when we are working with applications that are managed by partner teams, and we need to perform transformations to allow our cloud native applications to interoperate.
335
How can stateless transformations be scaled in cloud native applications?
Stateless transformations can be scaled horizontally without issues, and serverless compute options such as Amazon Lambda or Azure Functions can be used for these use cases.
Page 335
For stateless transformations, the cloud native applications can be scaled horizontally without any issues. We can use serverless compute options such as Amazon Lambda or Azure Functions for these use cases.
335
What pattern should be used for stateful transformations like calculating the average temperature over the last hour?
For stateful transformations, such as calculating the average temperature over the last hour, systems cannot be simply scaled horizontally. The Sequential Convoy pattern should be used to partition and scale these applications.
Page 335
When these transformations are stateful—for example, when we need the Windowed Aggregation pattern to calculate the average temperature over the last hour—these systems cannot be simply scaled horizontally. The Sequential Convoy pattern will show us how to partition and scale these applications.
335
How should events be filtered by category in an e-commerce platform?
Use subscription filters provided by message brokers to filter only the relevant type of data for processing. If that is not possible, implement an intermediate microservice or serverless function to filter and publish only the relevant events. This improves security and eliminates potential misuse of data.
Page 337
Filter events by category: Often we are interested in only certain types of events for processing. Take, for example, handling asynchronously published local and international shipment events distinctly in an ecommerce platform. In this case, when possible, use subscription filters provided by message brokers to filter only the relevant type of data for processing. But when that is not possible, we recommend implementing an intermediate microservice or serverless function to filter and publish only the relevant events. This also improves security and eliminates potential misuse of data, especially when the data is published to third parties.
337
Why is it essential to filter only the most critical data based on a threshold in some scenarios?
It is essential to filter only the most critical data based on a threshold when processing everything at all times is not computationally feasible. For example, in a banking use case with hundreds of transactions performed every minute, performing human verification on all events to detect fraud is not possible.
Page 337
Scenario: Apply a threshold for alerting: Sometimes we’re not interested in certain events, and processing everything at all times is not computationally feasible. In this case, it is essential to filter only the most critical data based on a threshold. For example, in a banking use case with hundreds of transactions performed every minute, performing human verification on all events to detect fraud is not possible.
337
What benefits do the Filters and Thresholds Pattern provide to cloud native applications?
The Filters and Thresholds Pattern allows cloud native applications to extract relevant events for processing and reduces their load by dropping irrelevant or lower-priority events.
Page 338
Considerations: the Filters and Thresholds Pattern not only allows cloud native applications to extract relevant events for processing but also reduces their load by dropping events that are irrelevant or lower priority.
338
How can modern message brokers like Kafka support the Filters and Thresholds Pattern?
Modern message brokers like Kafka natively support subscription to topics with a filter condition, allowing cloud native applications to avoid running additional containers just for filtering.
Page 338
It is important to note that modern message brokers such as Kafka now natively support this functionality, allowing cloud native applications to subscribe to their topics with a filter condition. This also avoids running additional containers just for filtering. This option is not always available, especially when publishing events to third-party systems.
338
How can filters be implemented and deployed in cloud native applications?
Filters can be implemented as stateless microservices and deployed in front of any other cloud native application to filter and pass only the relevant events. Serverless compute options such as Amazon Lambda and Azure Functions can also be used to implement the Filters and Thresholds Pattern.
Page 338
Filters can be implemented as stateless microservices and deployed in front of any other cloud native application to filter and pass only the events that are relevant. We can also readily leverage serverless compute options such as Amazon Lambda and Azure Functions to implement the Filters and Thresholds Pattern.
338
What are the types of windowed aggregation operations?
The types of windowed aggregation operations are length sliding, length batch, time sliding, and time batch. Aggregation operations are performed on top of these windows, and the aggregation output is emitted as a stream for further processing.
Page 339
Length sliding, length batch, time sliding, and time batch.The aggregation operations are performed on top of these windows, as windows limit the number of events that need to be considered for aggregation, and the aggregation output is emitted as a stream for further processing.
339
What pattern should be applied to ensure accurate aggregation calculations during system downtime?
The Two-Node Failover pattern should be applied to ensure accurate aggregation calculations are continuously emitted for decision making during system downtime.
Page 344
System downtime may cause business impact. Therefore, we have to apply reliability patterns such as the Two-Node Failover pattern, to make sure that accurate aggregation calculations are continuously emitted for decision making.
344
When is it not necessary to use reliability patterns for aggregation services?
If the service is not critical for the business, system downtime may not cause a business impact. In this case, there is no need to worry about preserving the window state, and therefore no need to use reliability patterns.
Page 344
Aggregate events over length: Sometimes the number of events is an important aspect of the aggregation, and those cannot be modeled with time.
If the service is not critical for the business, system downtime may not cause business impact. Therefore, we don’t need to worry about preserving the window state, and so there is no need to use reliability patterns.
344
Why is the Windowed Aggregation Pattern considered stateful and what are the implications during system failures?
The Windowed Aggregation Pattern is stateful because windows rely on multiple events, and a system failure or restart can cause those events to get lost, resulting in inconsistent aggregation results. When aggregations are critical, reliability patterns are needed to rebuild or recover the state after a failure or restart.
Page 345
Considerations: The most important aspect of the Windowed Aggregation Pattern is that it is stateful. Windows rely on multiple events, and a system failure or restart can cause those events to get lost, causing the aggregations to emit inconsistent results. When aggregations are not used for mission-critical use cases, it may be acceptable to lose those events during system failures or restarts. In this case, some aggregation outputs will not be published or will be inaccurate. But when the aggregation outputs are critical, we can apply reliability patterns (discussed later in this chapter) to make sure that we are appropriately rebuilding or recovering the state after a failure or restart.
345
Why can’t all types of aggregations be implemented with high accuracy and efficiency?
Not all types of aggregations can be implemented with high accuracy and efficiency because some, like calculating the median, require iterating through all events in a window, adding latency and requiring more space. In contrast, calculating the mean only needs the sum and count of events, enabling rapid computation without iterating through all events.
Page 345
It is also important to consider that we cannot implement all types of aggregations with high accuracy and efficiency. For example, we can use windows to model the mean (average), but not the median. The mean needs only the sum and the count of events in the window, and techniques can be used to progressively alter these values as events are added and removed from the window. This enables us to rapidly compute the average (sum/count) by not iterating through all the events in that window. But on the other hand, to calculate the median, we need to iterate through all the events in the window. This will not only add latency to the calculation, but persisting all events requires more space, which becomes more problematic as windows get larger.
345
How should systems be designed to withstand high load and scale on demand when using windowed aggregations?
Systems should be designed to withstand high load and scale on demand by sharding the collections of events in windows, allowing effective scaling of these operators.
Page 345
This now brings us to scaling of these operators. It is vital that we design the system to withstand high load and scale on demand. Because windows are collections of events, the most effective way of scaling them is by sharding.
345
What is the Scatter and Gather pattern and what is a common use case for it?
The Scatter and Gather pattern involves processing the same event in parallel with different operations and then combining the results to emit a single event. A common use case is a loan application process where operations like credit check, address verification, and identity verification are performed in parallel and then combined for a loan decision.
Page 349
Scatter and gather: In scatter and gather, we process the same event in parallel, performing different operations, and finally combine the results so all event outputs can be emitted as a single event. This is one of the most common scenarios for using the Stream Join Pattern.
For example, let’s consider a loan application process. The loan application can be initiated by an event that contains a unique loan application ID. Operations for this event—credit check, address verification, and identity verification—can be processed in parallel. But at the end, the outputs of all three operations need to be joined in order for the bank to make a decision on whether the applicant should be granted a loan.
349
What is the Stream Join pattern used for?
The Stream Join pattern is used to join various types of events based on a defined condition and a window.
Page 350
Join various types of events: The Stream Join pattern can also be used to join various types of events based on a defined condition and a window
350
Why is the Join operation considered stateful and what should be done when event loss cannot be tolerated?
The Join operation is stateful because it needs to wait for all matching events to arrive before making a valid join. When event loss cannot be tolerated, reliability patterns should be used to ensure events are preserved across system failures and restarts.
Page 351
Considerations: Join is a stateful operation; it needs to wait for all matching events to arrive before it makes a valid join. When event loss cannot be tolerated, we use reliability patterns to ensure that events are preserved across system failures and restarts.
351
How can simple scatter and gather scenarios avoid event loss upon system failure or restart?
In simple scatter and gather scenarios, events can be read directly from message brokers, and acknowledgment can be deferred until events are successfully joined. This ensures events are not lost upon system failure or restart, as the message broker will republish those events.
Page 351
But for simple scenarios such as scatter and gather, we can directly read events from message brokers and defer acknowledgment until those events are successfully joined. With this approach, we do not lose those events upon a system failure or restart, as the message broker will republish those.
351
What challenges arise from joining many events during a long time period and how can these be addressed?
Joining many events during a long time period can increase space requirements and processing times. The Sequential Convoy pattern can be used to shard events based on joining attributes, parallelizing the joining process and ensuring related events fall into the same shard for successful joining.
Page 351
Joining many events during a long time period can be challenging, as systems may suffer from increased space requirements and increased processing times. In this case, we recommend the Sequential Convoy pattern to shard events based on the joining attributes. This will parallelize joining and ensure that related events fall into the same shard so they can be joined successfully.
351
What is a common use of the Temporal Event Ordering Pattern?
A common use of the Temporal Event Ordering Pattern is to identify an incident by having a sequence of events occur in a prescribed order.
Page 354
Detect sequence of event occurrence: The most common use of the Temporal Event Ordering Pattern is for identifying an incident by having a sequence of events happen in a prescribed order.
354
How can the Temporal Event Ordering Pattern be used to detect the nonoccurrence of an expected event?
The Temporal Event Ordering Pattern can detect the nonoccurrence of an expected event to identify an erroneous situation. For example, notifying a homeowner if their garage door is left open for one minute after the car drives out.
Page 355
Detect nonoccurrence of event: Now, let’s say we want to identify an incident by an expected event not occurring. These are commonly used for detecting erroneous situations, such as notifying a homeowner that their garage door has been left open.
The user needs to receive a notification if the garage door is left open for one minute after the car drives out. The Temporal Event Ordering Pattern expects the door-close action to take place within one minute of the car leaving and notifies the user if the door does not close within that time frame (the nonoccurrence of the event).
355
What considerations are needed for state machines in cloud native applications using the Temporal Event Ordering Pattern?
State machines are inherently stateful, requiring applications to rely on reliability patterns to persist their states across system failures and restarts. Each application should have enough in-memory space to maintain the state machines, and the Sequential Convoy pattern should be applied to distribute events to various nodes for scalable and parallelized sequence matching.
Page 356
Considerations
As state machines are inherently stateful, this requires applications to rely on reliability patterns to persist their states across system failure and restarts. Also, we should ensure that each cloud native application has enough in-memory space to maintain the state machines. In addition, we should apply the Sequential Convoy pattern to distribute events to various nodes so that the sequence matching can be scaled and parallelized, while making sure all relevant events for a successful match are still routed to the same node.
356
How can event ordering be guaranteed in the Temporal Event Ordering Pattern?
Event ordering can be guaranteed using the Buffered Event Ordering pattern to overcome out-of-order events that occur during transmission. Correlating and ordering events based on event-generation time can help manage relative ordering.
Page 356
One of the other important aspects of The Temporal Event Ordering Pattern is that it requires events to be processed in the order they are generated. Though it is not possible to always determine relative ordering of events, correlating and ordering events based on event-generation time can still help overcome out-of-order events that happened during transmission. We recommend you use the Buffered Event Ordering pattern to guarantee ordering of events if they can become out of order during transmission.
356
How are prebuilt machine learning models generated and used?
Prebuilt machine learning models are generated by data scientists using data processing tools and machine learning frameworks like Apache Spark, TensorFlow, or Python. These models can be imported into running applications via technologies such as Predictive Model Markup Language (PMML) and queried on the fly to generate predictions. They can also run as separate cloud native applications and be called via APIs.
Page 358
Prebuilt machine learning models: These models can be generated by a data scientist using data processing tools and machine learning frameworks such as Apache Spark, TensorFlow, or even Python. Some of these models can be imported into running applications via technologies such as Predictive Model Markup Language (PMML), and we can query them on the fly to generate predictions. We can also run them as separate cloud native applications and call them via APIs. Because these models are prebuilt and cannot adapt based on new incoming events, we need to update them periodically to maintain and improve their prediction accuracy.
358
How do online machine learning models differ from prebuilt models?
Online machine learning models tune themselves based on the information they receive as they produce predictions and may require a feedback loop with the results from their previous predictions. These models can be embedded into applications or run as separate microservices.
Page 358
Online machine learning models: These are models that tune themselves based on the information they receive as they produce predictions. In some cases, the models require a feedback loop with the results from their previous predictions so that they dynamically train themselves. These models can be embedded into applications or run as separate microservices.
358