Chapter 6, Stream-Processing Patterns Flashcards
What is a stream?
A stream can be defined as a continuous sequence of events ordered by time. The stream consists of a name and version that uniquely identify it, such as StockStream 1.0. All events in a stream have a common message format and structure. For example, StockStream has a JSON format and contains symbol, price, and volume in its structure. Having a consistent format and structure allows events in the stream to be processed in an automated manner, using stream-processing systems. The stream version provides a way to safely modify the structure and evolve the stream over time.
330
What Is Stream Processing?
Stream processing is performing operations on events in motion. It can be as simple as a stateless service consuming events and transforming its event format, or as complex as storing and processing stateful data in memory with low latency and reliability.
In contrast to simple event processing, stream processing supports use cases in which events need to be handled in the order they are generated. Stream-processing patterns can also remember and use previous events when making a decision. For example, detecting if a stock price is continuously increasing over the last five minutes requires remembering previous events and processing them in order, both in real time.
330
What are Streaming Data Processing Patterns and what do they focus on?
Streaming data processing patterns focus on how we can generate useful output by processing real-time events through transformation, filtering, aggregation, and detecting meaningful sequences of events. These capabilities enable cloud native applications to process events on the fly with low latency.
A key performance consideration is avoiding heavy use of persistent data stores. In a cloud native application, the round-trip time of accessing the data store, and the potential for contention, can add significant processing latency to solutions. For some use cases, it is required, but as a general rule of thumb, it should be avoided.
331
Describe the Transformation Pattern
The Transformation pattern helps transform events from an event source and publish them to another system with a different format, structure, or protocol.
332
How does the Transformation Pattern work?
This pattern maps the data of one event to another.
These transformations are often achieved purely with the information contained in the incoming event. But at times these transformations need other patterns, such as the Windowed Aggregation pattern.
332 Figure 6-1. XML-to-JSON transformation
For example, say we are to publish weather events to a third-party system that expects the events in JSON format with a particular structure (Figure 6-1). The relevant data from the incoming event can be extracted and mapped to the new event format. We can achieve this by using JSON and XML libraries, or by using a graphical interface or SQL-based data-mapping approaches provided by stream-processing technologies.
What are some related patterns to the Transformation Pattern?
The Transformation pattern can be combined with other stream data processing patterns, as data transformations can be required for incorporating results of those patterns, such as enriching events with aggregated data.
335
Describe the Filters and Thresholds Pattern
Sometimes we need to filter events based on given conditions, or allow only events with values that fit within a given threshold range. The Filters and Thresholds pattern is useful for extracting only the relevant events we need.
336
How does the Filters and Thresholds Pattern work?
Users provide conditions that match against the incoming events. These conditions can include exact string matches, substring matches, regular expressions, or threshold ranges when it comes to numeric values with comparison operations such as <, <=, >, >=, and ==. Often more than a single condition is required, so those conditions are consolidated by using the AND, OR, and NOT logical operations and parentheses to generate more-complex filter conditions.
This pattern extracts and processes the relevant data from the input event stream by using data-mapping techniques
336 Figure 6-3. Filtering car events based on brand and year
If we are processing a real-time stream of car sales and are interested in only 2010 or newer Toyota vehicles, we can define a filtering condition as shown in Figure 6-3 to emit only events that satisfy the condition.
What are some related patterns to the Filters and Thresholds Pattern?
The Filtering and Thresholds pattern can be applied with all the other stream data processing patterns, as we often need to filter events for those patterns (for example, to aggregate only a particular type of event).
338
Describe the Windowed Aggregation Pattern
The Windowed Aggregation pattern enables us to analyze a collection of events based on a condition. Here, aggregation analysis can include operations like summation, minimum, maximum, average, standard deviation, and count, and the window defines the collection of events used for aggregation.
These windows can be based on the time or event count, such as the last five minutes or the last 100 events. These windows may also have behaviors such as sliding or batching, defining when events are added and removed from the window.
This pattern enables us to aggregate data on the fly and make time-critical business decisions within milliseconds.
338
What are some of the most common windows?
- Length sliding
- Length batch
- Time sliding
- Time batch
339
How is the Windowed Aggregation Pattern used in practice?
The Windowed Aggregation pattern is stateful, meaning it stores data related to the events in memory. Therefore, when designing noncritical use cases such as monitoring that tolerates data loss, we can implement this pattern on any cloud native application. But when the use case requires reliable event processing, we need to combine this pattern with the reliability patterns
343
What are some related patterns to the Windowed Aggregation Pattern?
- Transformation pattern
Appropriately maps the aggregation to the output. - Reliability patterns
Help make the window and aggregation state survive system failures. - Sequential Convoy pattern
Allows aggregations to be performed in parallel based on shard keys. This not only helps scale aggregation processing, but also allows us to aggregate different types of events in isolation and produce aggregations per event type. - Service Orchestration pattern
Splits the events by different shard keys for processing. This pattern is described in Chapter 3. - Stream Join pattern
Aggregates results from different shards.
347
Describe the Stream Join Pattern
The Stream Join pattern resembles the join of SQL tables and enables us to join events from multiple streams with different schemas.
348
How does the Stream Join Pattern work?
This pattern works by defining a condition to identify the joining events. This condition will pick attributes from each joining event stream and define the condition under which they should be joined. This can be a simple equal condition, like joining events from all event streams having the same ID, or it can be more complex. The join should also define a buffer that determines how long events should wait for corresponding events to arrive from other event streams. This buffer period can be common across all streams or can vary among streams. Most stream-processing systems define this buffer period via windows.
Finally, as in the Windowed Aggregation pattern, it is important for this pattern to use the Transformation pattern to map the joining events and their attributes to the output.
348 Figure 6-5. Stream Join based on events that have arrived during the last minute
How is the Stream Join Pattern used in practice?
The Stream Join pattern is stateful, as it buffers events for the join. Like the Windowed Aggregation pattern, this one can be implemented in any cloud native application as long as the use case is not business critical and can tolerate event loss. But when event loss is not acceptable, this pattern should be applied along with reliability patterns, so the application can withstand system failures and restarts without event loss.
349
What are some patterns related to the Stream Join Pattern?
- Transformation pattern
Appropriately maps joining event attributes to build the output. - Reliability patterns
Helps the joint state survive system failures. - Sequential Convoy pattern
Scales joins by performing them in parallel by allowing relevant joining events to fall into the same shard.
352
Describe the Temporal Event Ordering Pattern
The Temporal Event Ordering pattern is unique for stream processing. It tries to detect various interesting complex event occurrences by identifying patterns based on event arrival order. The pattern can also detect occurrence and nonoccurrence of incidents based on events emitted by various systems.
352
How does the Temporal Event Ordering Pattern work?
This pattern works on the concept of nondeterministic finite-state machines: the application state changes based on the input event and the current application state. The possible state transitions can be represented as a state graph that traverses from one state to another until it reaches either a success or fail state. Upon reaching the success state, the user is notified, as it means the expected events have occurred in order.
This pattern can also be used to identify sequences of events that are immediately followed by one another or scattered randomly among other events. We can also use this to detect the nonoccurrence of events by combining state transitions with time-outs.
Use cases such as stock monitoring most often require the event sequence to be detected repeatedly. To achieve this, a new state machine instance should be initiated upon each event arrival that triggers the initial state of the state machine.
352 Figure 6-7. Using the Temporal Event Ordering pattern to detect a continuous stock price increase followed by a single drop
How is the Temporal Event Ordering Pattern used in practice?
Like the Windowed Aggregation and Stream Join patterns, this pattern should also be combined with reliability patterns to preserve data loss during system failures and restarts. Furthermore, as event arrival order is critical for the success of this pattern, we recommend using patterns like Buffered Event Ordering to guarantee ordering of events before processing them.
354
What are some patterns related to the Temporal Event Ordering Pattern?
- Transformation pattern
Appropriately maps the matched events in the sequence to generate a meaningful output. - Reliability patterns
Helps state machines survive system failures. - Sequential Convoy pattern
Scales sequence matching by performing it in parallel by allowing relevant events to fall into the same shard. - Buffered Event Ordering pattern
Orders events based on event-generation time to facilitate correct behavior of this pattern.
356
Describe the Machine Learner Pattern
We can use machine learning models in real time to generate predictions and automate decision making. Machine learning models can be prebuilt to produce predictions without updating themselves based on new input events. Online machine learning models can produce predictions while continuously learning, based on new incoming events, whether or not they’re pre-generated, making our cloud native application much more intelligent.
357
How does the Machine Learner Pattern work?
We can generate predictions in cloud native applications in two ways: by executing prebuilt machine learning models and by using online machine learning models.
357
How is the Machine Learner Pattern used in practice?
Machine learning has now become an integral part of many applications, and cloud native applications should also be well equipped to incorporate them. One common way of integrating machine learning models is to deploy them as individual microservices and make service calls. Alternatively, machine learning models can be embedded into the applications, which can continuously produce predictions based on incoming events. Some scenarios using this pattern are described next.
359
What are some patterns related to the Machine Learner Pattern?
- Transformation pattern
Appropriately maps the predictions of the machine learning model to generate a meaningful output. - Reliability patterns
Store and restore online machine learning algorithm state.
362
When to use the Transformation pattern?
To transform the event format, structure, or protocol.
To add or remove partial data to or from the event.
Third-party systems do not support the current event.
362
When not to use the Transformation pattern?
The consuming system has the ability to understand the event.
362
What are the benefits of using the Transformation pattern?
Allows incompatible systems to communicate with one another.
Reduces event size by containing only relevant information.
362
When to use the Filters and Thresholds pattern?
Only a subset of events is relevant for processing.
362
When not to use the Filters and Thresholds pattern?
All events are needed for decision making.
362
What are the benefits of using the Filters and Thresholds pattern?
Reduces the load on the system by selecting only events that can produce the most value to the use case.
362
When to use the Windowed Aggregation pattern?
To aggregate events over time or length.
To perform operations such as summation, minimum, maximum, average, standard deviation, and count on the events.
362
When not to use the Windowed Aggregation pattern?
For operations that cannot be performed with fixed memory such as detecting the median of the events.
High accuracy is needed without the use of reliability patterns.
362
What are the benefits of using the Windowed Aggregation pattern?
Reduces the load on the system by aggregating events.
Provides data summary to better understand the behavior as a whole.
362
When to use the Stream Join pattern?
To join events from two or more event streams.
To collect events that were previously split to parallelize processing.
362
When not to use the Stream Join pattern?
Joining events do not arrive in relatively close proximity.
High accuracy is needed without the use of reliability patterns.
362
What are the benefits of using the Stream Join pattern?
Allows events to be correlated.
Enables synchronous processing of events.
362
When to use the Temporal Event Ordering pattern?
To detect the sequence of event occurrences.
To detect the nonoccurrence of events.
362
When not to use the Temporal Event Ordering pattern?
Event sequencing cannot be defined as a finite-state machine.
High accuracy is needed without the use of reliability patterns.
Incoming events arrive out-of-order.
362
What are the benefits of using the Temporal Event Ordering pattern?
Allows detecting complex conditions based on event arrival order.
362
When to use the Machine Learner pattern?
To perform predictions in real time.
To perform classification, clustering, or regression analysis on the events.
362
When not to use the Machine Learner pattern?
We cannot use a model to accurately predict the values.
Historical data is not available for building machine learning models.
362
What are the benefits of using the Machine Learner pattern?
Automates decision making.
Provides reasonable estimates.
362
Describe Scaling and Performance Optimization Patterns
Cloud native applications that perform stream processing have unique scalability and performance requirements. For instance, these applications require event ordering to be maintained while processing events. Furthermore, as most of these applications have in-memory state, they also need a strategy to scale so they can process more events without compromising their accuracy.
364
Describe Sequential Convoy Pattern
The Sequential Convoy pattern scales cloud native stream-processing applications by separating events into various categories and processing them in parallel. It also works to persist event ordering so events can be combined at a later time, while preserving the original order of the events.
364
How does the Sequential Convoy Pattern work?
As the name suggests, this pattern sees events as items moving along a conveyor belt. It groups the events into categories based on their characteristics and processes them in parallel.
364
How is the Sequential Convoy Pattern used in practice?
This pattern is used for scaling event processing so we can process more events with cloud native applications that have limited memory capacity, and for partitioning events so that each substream is processed differently. Let’s look at how this pattern can be used in various scenarios.
366