Chapter 6, Stream-Processing Patterns gpt Flashcards

Question

How can the Temporal Event Ordering Pattern be used to detect the nonoccurrence of an expected event?

Answer 1

The Temporal Event Ordering Pattern can detect the nonoccurrence of an expected event to identify an erroneous situation. For example, notifying a homeowner if their garage door is left open for one minute after the car drives out. Page 355 ##Footnote Detect nonoccurrence of event: Now, let’s say we want to identify an incident by an expected event not occurring. These are commonly used for detecting erroneous situations, such as notifying a homeowner that their garage door has been left open. The user needs to receive a notification if the garage door is left open for one minute after the car drives out. The Temporal Event Ordering Pattern expects the door-close action to take place within one minute of the car leaving and notifies the user if the door does not close within that time frame (the nonoccurrence of the event). 355

Answer 2

State machines are inherently stateful, requiring applications to rely on reliability patterns to persist their states across system failures and restarts. Each application should have enough in-memory space to maintain the state machines, and the Sequential Convoy pattern should be applied to distribute events to various nodes for scalable and parallelized sequence matching. Page 356 ##Footnote Considerations As state machines are inherently stateful, this requires applications to rely on reliability patterns to persist their states across system failure and restarts. Also, we should ensure that each cloud native application has enough in-memory space to maintain the state machines. In addition, we should apply the Sequential Convoy pattern to distribute events to various nodes so that the sequence matching can be scaled and parallelized, while making sure all relevant events for a successful match are still routed to the same node. 356

Answer 3

Event ordering can be guaranteed using the Buffered Event Ordering pattern to overcome out-of-order events that occur during transmission. Correlating and ordering events based on event-generation time can help manage relative ordering. Page 356 ##Footnote One of the other important aspects of The Temporal Event Ordering Pattern is that it requires events to be processed in the order they are generated. Though it is not possible to always determine relative ordering of events, correlating and ordering events based on event-generation time can still help overcome out-of-order events that happened during transmission. We recommend you use the Buffered Event Ordering pattern to guarantee ordering of events if they can become out of order during transmission. 356

Answer 4

Prebuilt machine learning models are generated by data scientists using data processing tools and machine learning frameworks like Apache Spark, TensorFlow, or Python. These models can be imported into running applications via technologies such as Predictive Model Markup Language (PMML) and queried on the fly to generate predictions. They can also run as separate cloud native applications and be called via APIs. Page 358 ##Footnote Prebuilt machine learning models: These models can be generated by a data scientist using data processing tools and machine learning frameworks such as Apache Spark, TensorFlow, or even Python. Some of these models can be imported into running applications via technologies such as Predictive Model Markup Language (PMML), and we can query them on the fly to generate predictions. We can also run them as separate cloud native applications and call them via APIs. Because these models are prebuilt and cannot adapt based on new incoming events, we need to update them periodically to maintain and improve their prediction accuracy. 358

Answer 5

Online machine learning models tune themselves based on the information they receive as they produce predictions and may require a feedback loop with the results from their previous predictions. These models can be embedded into applications or run as separate microservices. Page 358 ##Footnote Online machine learning models: These are models that tune themselves based on the information they receive as they produce predictions. In some cases, the models require a feedback loop with the results from their previous predictions so that they dynamically train themselves. These models can be embedded into applications or run as separate microservices. 358

Answer 6

Using a prebuilt machine learning model is ideal when there is abundant historical data, and the prediction pattern does not change with new events. For example, automating detection of defective parts in a controlled production line environment. Page 359 ##Footnote Predict based on prebuilt machine learning models: Using a prebuilt machine learning model is ideal when we have abundant historical data, and when the prediction pattern does not change with new events. Let’s consider an example of automating detection of defective parts in a production line. Detecting defects early in a production line can reduce cost. A microservice can be deployed with a prebuilt linear regression model to examine the parameters of parts and detect any defects, and to allow the parts that do not have defects to progress along the pipeline. If the manufacturing happens in a controlled environment with the same input materials, temperature, and machinery, the prediction will be accurate for a longer period, and we don’t need to update the prebuilt models as often. 359

Answer 7

Continuously learning online machine learning models should be used if it is expected that the models will need to learn new behaviors as they receive new data in the future. Page 360 ##Footnote Continuous learning with data: We recommend using continuously learning online machine learning models if we expect them to learn new behaviors (as we receive new data) in the future. 360

Answer 8

Prebuilt machine learning models are simpler to implement and generally have high accuracy with enough data but cannot adapt to new trends. Online machine learning algorithms perform better in adapting to new data but can fluctuate in accuracy due to memory constraints and data limitations. Combining both types of models can provide higher accuracy by overriding prebuilt model predictions with more accurate online model predictions. Page 361 ##Footnote Considerations: Prebuilt machine learning models are much simpler to implement and generally have high accuracy when we have enough data. At the same time, they cannot adapt to new trends. For these cases, online machine learning algorithms perform much better. But these algorithms can fluctuate in accuracy, because most are constrained by the amount of data they can store in memory and therefore have limitations on how much data they can learn. This compromises their ability to produce data with high accuracy. Therefore, we recommend combining prebuilt and online machine learning algorithms where possible, so you can override the predictions of the prebuilt models when you have higher accuracy from the online machine learning models. 361

Answer 9

It is important to update prebuilt machine learning models periodically because their accuracy can degrade over time due to changes in circumstances. In cloud native applications, these models can be embedded, making it easy to update the application version along with a newer model. Page 361 ##Footnote When using prebuilt machine learning models, it is important to update them periodically. Over time, the model accuracy can degrade because of changes in circumstances. In cloud native applications, these models can be embedded, so updating the application version along with a newer model provides an easy deployment path. 361

Answer 10

Online machine learning models store what they’ve learned in memory, so it is recommended to use reliability patterns on those cloud native applications to ensure they can recover state across system failures and restarts. Page 361 ##Footnote Online machine learning models store what they’ve learned in memory. We recommend using reliability patterns on those cloud native applications so they recover state across system failures and restarts. 361

Answer 11

Yes, it is possible to perform aggregations without a window, where the window spans from the service startup to the current time. These aggregations must operate with constant space complexity to avoid running out of memory. Page 346 ##Footnote Things to note: It is also possible to perform aggregations without a window. Consider that the window spans from the service startup to the current time. Aggregations would reflect all the events that occurred during that period. But it is important that we implement these window aggregations to operate with constant space complexity (consumption of memory not depending on the number of events processed); otherwise, the system could run out of memory. 346

Answer 12

The Sequential Convoy Pattern groups events into categories based on their characteristics and processes them in parallel. For example, an ecommerce application can categorize purchase events by customer type and order size to provide various delivery guarantees. Page 364 ##Footnote How it works: As the name suggests, the Sequential Convoy Pattern sees events as items moving along a conveyor belt. It groups the events into categories based on their characteristics and processes them in parallel. One example is an ecommerce application that provides different product delivery time guarantees based on type of customer and order size. This application can categorize purchase events by the type of customer (such as premium, gold, or not a member) and can categorize gold member purchases by their order size (such as $50 or less as small, $50 to $500 as medium, and more than $500 as high). This allows events to be partitioned and processed in parallel to provide various types of delivery guarantees, such as the same day, in two days, and in one week. 364

Answer 13

Each event can be labeled with a sequence number before separation, allowing substreams to maintain event order during processing. This sequencing can also be used to join parallel streams later based on the original event order using a merge sort. Page 365 ##Footnote Each event also can be labeled with a sequence number before separation. This allows each substream to maintain the event order during processing by applications that require guaranteed ordering, such as when using patterns like Windowed Aggregations or Temporal Event Ordering. This sequencing can also be used to join the parallel streams together at a later time, based on the original event order. We use a merge sort by selecting the smallest sequence number among all the substreams. This enables us to group the events back in their original order and emit them for more processing. Some events are still stuck in the windows, as the event with Seq#432 has not yet arrived. Once that arrives, it will be emitted next, and the event with Seq#433 will follow. Likewise, other events will also be emitted in the sequence number order until the next missing sequence number is detected. Message brokers and event queues, play a vital role in realizing the Sequential Convoy Pattern. They allow us to split and transfer events to substreams and to buffer events for grouping back into a single stream. 365

Answer 14

The Sequential Convoy Pattern helps overcome limitations such as CPU, memory, and bandwidth by partitioning events into multiple substreams and parallelizing their processing, eliminating latency addition and event buildup in queues. Page 367 ##Footnote Scale stream-processing applications: The Sequential Convoy pattern helps overcome cloud native stream-processing application limitations such as CPU, memory, and bandwidth, and allows us to process events with high throughput and low latency. For example, say we’re processing an event stream that transfers large amounts of confidential data. The events are encrypted and compressed to transfer the data much faster. Performing CPU-intensive operations such as uncompressing, decrypting, and transforming the data in real time is time-consuming, and performing such complex operations not only slows events, but also adds latency for the following events in the stream. This can lead to backlog buildup in the queues and cause bottlenecks for the whole system. By simply partitioning the events into multiple substreams and parallelizing their processing, we can eliminate the latency addition and the event buildup in queues. We can use a simple round-robin strategy to distribute events to multiple substreams, which will process events much faster. Let’s say the event-processing microservice also needs to enrich the event based on customer ID from a data store lookup. Retrieving data from data stores can be time-consuming and can potentially introduce latency to the overall processing time. One way to improve the performance is by caching the data in the microservice. But we cannot cache all the data because of the limited storage capacity of the microservice, and cache misses will still add latency. We can use the Sequential Convoy pattern to separate events into different substreams based on hash values of customer IDs. This will process all events belonging to the same customer ID on the same node, improving the chance of cache hits. By reducing the number of customer IDs processed by a single microservice, we further increase cache hits, thereby meeting our performance goals. 367

Answer 15

The Sequential Convoy Pattern enables partitioning different event types into parallel streams to execute different use cases against the same event stream, such as categorizing customers and providing different discounts in an ecommerce platform. Page 368 ##Footnote Partition the stream processing: The Sequential Convoy pattern enables us to execute different use cases against the same event stream by partitioning different event types into parallel streams. For example, imagine an ecommerce platform providing extra discounts to premium customers. It can use customer data to categorize customers at different levels, such as premium and standard, and provide extra discounts for premium customers. By using the Sequential Convoy pattern, it can split the events into multiple substreams based on customer attributes, and then use different types of microservices to process those substreams through two pipelines to provide relevant discounts. 368

Answer 16

The ability to categorize events based on attributes like customer ID or product ID is crucial. Sometimes it may require combining various attributes to better process related events together and in parallel. Also, be mindful of noncontiguous sequence numbers when regrouping events. Page 369 ##Footnote Considerations: The ability to categorize events based on event attributes is crucial for meaningful stream processing. Sometimes events can be categorized based on a single attribute such as customer ID or product ID, but in other cases we might need to combine various attributes such as order value or place of order to better process related events together and in parallel. When events are separated in multiple substreams, they can go through various cloud native applications, and during this process some events get filtered and dropped. This can make the sequence numbers noncontiguous. Therefore, we have to be mindful when events are regrouped as we might not find all of them. We can assume events have been dropped based on the next emitted events. But when we cannot reliably determine missing events, we employ a time-out to determine that the events were dropped. 369

Answer 17

An effective way to regroup events is by using an end-of-sequence message emitted periodically with the last processed message ID. This helps identify dropped messages and unblock the processing of later events. Page 369 ##Footnote A better approach to regrouping events is an end-of-sequence message. This can be emitted by the processing applications with the last process message ID in a periodic time interval. This tells us that all IDs before the given end-of-sequence ID have been processed by the upstream applications and that the missing IDs smaller than the last ID are dropped messages. This unblocks the processing of later events. 369

Answer 18

When grouping based on sequence numbers is not possible, collect events and publish them to a single topic. Use the Buffered Event Ordering pattern to buffer and sort events based on sequence numbers or event timestamps. Page 369 ##Footnote When grouping based on sequence numbers is not possible, we can simply collect events and publish them to a single topic, and then use the Buffered Event Ordering pattern (discussed later in this chapter), to buffer and sort events based on sequence numbers or event timestamps. 369

Answer 19

To scale stream-processing microservices when they become a bottleneck, reshard by altering the relevant categories to divide the saturated substream into multiple substreams. Migrate the application state across all substreams to ensure efficient scaling. Page 369 ##Footnote It is also important to plan for scaling the number of streams when we detect that the stream-processing microservice is becoming a bottleneck. One approach is to reshard by altering the relevant categories to divide the saturated substream into multiple other substreams for processing events. We also need to migrate the application state across all substreams. Therefore, it is important to store the application state in such a way that we can separate those streams when we need to scale. 369

Answer 20

Evaluate whether adding sequence numbers to events and rejoining them based on those numbers can create a bottleneck. Reevaluate if ordering is essential for use cases that require extremely high throughput and low latency. Page 370 ##Footnote Things to note: It is important to evaluate the throughput when using the Sequential Convoy Pattern, and whether it meets expectations. For use cases that require extremely high throughput and low latency, adding sequence numbers to events and rejoining events based on those numbers can create a bottleneck, and in these cases we need to reevaluate whether ordering is really essential. 370

Answer 21

Events generated by distributed sources can be ordered through timestamps. Use the Buffered Event Ordering pattern to fetch and reorder events based on timestamps, ensuring the sensors have synchronized times for accurate ordering. Page 372 ##Footnote Order events generated on distributed event sources: Events generated by distributed sources usually become out of order because of data transmission latency added by network and intermediate systems. Consider an example of distributed surveillance sensors emitting events. These events can reach the processing system at various times because of transmission latency, so they will be out of order when we combine all of them into a single stream for processing. As the sensors are distributed, the only way to order the events is through timestamps. The out-of-order events can be sent to a single topic in a message broker, and by using a microservice, those events can be fetched, reordered through the Buffered Event Ordering pattern, and sent downstream for further processing. We can use the Buffered Event Ordering Pattern only when the sensors have their times synced and the ordering based on the timestamps is reasonably accurate for processing. 372

Answer 22

Events from the same event sources can be reordered after parallel processing by adding sequence numbers along with the timestamp. This allows the events to be grouped back together efficiently. Page 373 ##Footnote Reorder events generated from the same event sources: Often we need to parallelize event processing to achieve performance, and then reorder events into their original sequence for further processing. For example, say we want to parallel-process user interaction on a browser and merge those events back in order. Because all the events needed to track user behavior are generated from the same browser, we recommend adding sequence numbers to those events along with the timestamp. This will not only allow us to process the events in parallel, to improve efficiency, but also group them back together, as we discussed in the Sequential Convoy pattern. We can feed the processed events to a topic in a message broker, and use a cloud native application to fetch and order the events based on their sequence numbers. 373

Answer 23

The Buffered Event Ordering Pattern is useful for aggregating events over time or detecting a sequence of actions. However, it can add latency or introduce a bottleneck and should not be used when events are generated from distributed sources with unsynchronized timestamps. Page 373 ##Footnote Considerations: the Buffered Event Ordering Pattern is useful when we need to aggregate events over time or when a sequence of actions needs to be detected. In all other cases, we do not recommend using the Buffered Event Ordering Pattern, as it can add latency or introduce a bottleneck to the system. Events can be reordered with high accuracy, but only when those events are generated from a single source. We can’t ensure that ordering by event timestamps will produce true ordering, as the distributed sources that generate events will not have their timestamps synchronized to the millisecond. 373

Answer 24

Sequence numbers should be added to events generated from a single source because ordering events based on sequence numbers is more efficient and does not add as much latency as ordering by timestamp. Page 373 ##Footnote When events are generated from a single source, always try to add sequence numbers to the events along with the timestamp, because ordering events based on sequence number is much more efficient. It also does not add the same amount of latency as ordering events by timestamp. 373

Answer 25

The decision to send the out-of-order event forward for processing or drop it depends on the use case. For example, dropping an old event may be acceptable for reporting current status but not for tracking credit card transactions to monitor fraud. Page 373 ##Footnote When we have detected a late-arriving event, we have to decide whether to send the out-of-order event forward for processing or drop it. This decision depends on the use case. For example, if the events are reporting a current status (like temperature of the furnace in an industrial setup), dropping an old event will not cause issues because we have more-recent data for processing. But if the events are credit card transactions that we track to monitor fraud, dropping events can cause issues. This can lead to detecting invalid sequences if the processing application is using patterns such as Temporal Event Ordering. 373

Answer 26

The Buffered Event Ordering Pattern's microservice must employ reliability patterns because it stores events in the buffer while waiting for older events, meaning it has state that needs to be recovered across failures and restarts. Page 373 ##Footnote The microservice implementing the Buffered Event Ordering pattern needs to store some events in the buffer while it is waiting for older events to arrive; this means that the microservice has state. It is important to employ reliability patterns like Periodic Snapshot State Persistence or Replay so the service can recover its state across failures and restarts. 373

Answer 27

For the Course Correction Pattern to work, downstream applications should know that events can be partial updates and adapt based on more-accurate updates arriving later. Page 375 ##Footnote Things to note: For the Course Correction Pattern to work, downstream applications should be able to know that these events can be partial updates and to adapt based on more-accurate updates that will arrive later. 375

Answer 28

The Course Correction Pattern holds events for more time and alters decisions based on late arrivals, useful for displaying real-time results on a screen. For example, creating buckets for each minute to count orders and sending updates when late events arrive. Page 376 ##Footnote Update results with new information: the Course Correction Pattern is commonly used when users are eager to obtain aggregated results quickly. This can especially be useful when we are displaying results in real time on a screen. In these cases, we simply need to hold the events for more time, and alter the decision based on late event arrivals. For example, let’s say we want our application to calculate the sum of orders arrived per minute, and report the number of orders on a per-minute basis.we simply need to create buckets for each time period, such as 2021.05.03-07:30 and 2021.05.03-07:31, denoting the time periods in minutes, and then keep a counter within the bucket to continuously count the values arriving during that period. We will be able to emit the events when the time period ends, as well as send an update if events arrive later, such as sending an update for 2021.05.03-07:30 along with the output of 2021.05.03-07:31. Be careful about when to purge the buckets, as having multiple buckets can use large amounts of memory. Purging them early can cause calculation errors, as events arriving late won’t have their respective bucket with previous aggregations. 376

Answer 29

The Course Correction Pattern allows early decisions to be made and sends compensation events for corrective actions when the situation changes. For example, broadcasting a message to all taxis for a ride request and sending a correction event to cancel other taxi assignments if multiple taxis accept the ride. Page 377 ##Footnote Correct previous decisions: Sometimes we need to make an early decision, and when the situation changes, we send compensation events so that corrective actions can be taken. Let’s say we want to dispatch a taxi as soon as a user requests one. When a user requests a taxi, we broadcast that message to all taxis in the region, and when we know a taxi has accepted the ride, we send another broadcast message to all taxis to inform them that the ride is accepted. During the initial request, there is a chance of multiple taxis accepting it, but we discover that only later because of network delays. Because we could mistakenly dispatch more than one taxi for the ride, we send a correction event to cancel the other taxi assignments. 377

Answer 30

The Course Correction Pattern should be used when early estimates are useful, and the use case allows for compensation based on an update. It should not be used for use cases that do not support course correction, as it may require high memory usage to wait for late events. Page 378 ##Footnote Considerations: the Course Correction Pattern can be used only when early estimates are useful and the use case allows for compensation or course correction based on an update. For use cases that do not support course correction, we have to delay the decision making by using patterns like Buffered Event Ordering. In most cases, events will be stored in memory while we are waiting for late event arrivals. This can cause high memory usage, so we need to find a balance between how long the system can wait for late events without running out of memory. Since course correction also needs to remember previous values or previously emitted results, we need to apply reliability patterns to ensure that their state is preserved across system failures and restarts. 378

Answer 31

Watermarks can be used to generate synchronized event groups that produce accurate aggregation results by emitting watermark events at given intervals. This ensures accurate aggregation unaffected by network delays or other external factors. Page 381 ##Footnote Synchronize events generated from event sources that are time synchronized: Watermarks can be used to generate synchronized event groups that can produce accurate aggregation results. Consider aggregate readings from multiple servers in a server farm that are in sync and emit events. We simply need to emit the watermark events at given time intervals to the input streams generated by those servers. This helps us collect the events between those watermarks and perform aggregation operations. This also ensures that the given aggregation is accurate and not affected by any network delays or other external factors. 381

Answer 32

Each sensor client can periodically fetch a sequence number from a global counter on a central server and inject it with the sensor reading. Events can be synchronized based on the sequence number of the watermark events, using the timestamps of those watermark events to determine the relative time of events. Page 382 ##Footnote Synchronize events generated from nonsynchronized sources: Let’s say we want to detect interesting incidents by using Temporal Event Ordering from multiple surveillance sensors deployed across the neighborhood. As events are emitted from distributed sensors, some sensors can be emitting them with a delay, or their events may arrive later because of network latency. To enforce synchronization, each sensor client can periodically fetch a sequence number from the global counter deployed on a central server and inject it along with the sensor reading. we can synchronize the events based on the sequence number of the watermark events, and use the timestamps of those watermark events to determine the relative time of events, thereby determining the true order of the events and detecting event occurrence patterns. 382

Answer 33

The Watermark Pattern is useful for significant differences in event arrival times due to network latency or unsynchronized input systems. It should not be used unless necessary due to its complexity. It can reduce time synchronization issues but cannot guarantee accurate results for all use cases. Page 383 ##Footnote Considerations: The Watermark Pattern is useful only when we know that significant differences exist in event arrival times because of network latency or when the input systems do not have their times in sync. In other cases, the Watermark Pattern does not bring us many advantages. For example, when systems are already synchronized on time, we can use the Buffered Event Ordering pattern to sort events by their timestamps. We also do not recommend using the Watermark Pattern unless you have a strong reason to process events on the actual time they are generated and need relative ordering of those events. Avoid using the Watermark Pattern when it is not truly necessary because of the architectural and technological complexity it adds to the whole infrastructure. Things to note: The Watermark Pattern can only reduce the time synchronization issues among streams; we cannot guarantee that it can produce accurate results for all use cases. Even with periodic time syncs, we can have issues resolving which event of one stream arrived before an event from another stream. 383

Answer 34

Reliability is key because most cloud native stream-processing applications store their state in memory for low latency and high throughput. Reliability patterns ensure the state of applications is preserved across failures and restarts. Page 386 ##Footnote Most cloud native stream-processing applications store their state in memory so they can process events with low latency and high throughput. For this reason, reliability is key for stream processing applications. As we’ve mentioned throughout the chapter, reliability patterns guarantee that the state of our applications can be preserved across failures and restarts. 386

Answer 35

Reliability patterns help achieve at-least-once event delivery and allow exactly-once event processing, ensuring events are processed once and only once, even during system failures. Page 386 ##Footnote Reliability patterns also help us achieve at-least-once event delivery—sending an event one or multiple times—and allow exactly once event processing—processing events once and only once, even during system failures. This section dives deep into various reliability patterns that are used to preserve cloud native application state during critical situations. 386

Answer 36

Events can be replayed by retrieving events from a durable topic subscription and deferring acknowledgment until successful processing. This ensures events are deleted from the message broker only when successfully processed and output is produced. Page 388 ##Footnote Replay events when system state is not persisted: What if we want to generate one-minute aggregations of purchase orders? We are processing data in one-minute batches, and at the end of each minute, the microservice generates the aggregation and clears its state. If we are retrieving events from a durable topic subscription, we can defer the acknowledgment for the retrieved events until the end of the minute; this will ensure that we are deleting the events from the message broker only when we have successfully processed them and produced the output. 388

Answer 37

Events can be replayed by storing the aggregation state to a data store periodically. During failure, the last state can be retrieved from the data store, and all events from that point can be replayed. Page 388 ##Footnote Replay events when the system persists its state: What if we want to aggregate the average temperature over the last hour? Let’s assume the processing microservice consumes events from Kafka and sends updates every minute to its dependents. By storing its aggregation state to a data store every minute when sending the updates, during failure, it can retrieve the last state from the data store, and replay all events from that point. 388

Answer 38

The Replay Pattern can produce duplicate events, requiring dependent systems to be idempotent. It should not be used if duplicate events can cause confusion. Buffer events for a longer period to prevent loss during extended failures, and use it with Periodic Snapshot State Persistence for critical applications. Page 389 ##Footnote Considerations: Though the Replay pattern helps re-create the application state, it can produce duplicate events as a result of the replay. In this case, the dependent systems should be idempotent. We should not use the Replay Pattern when duplicate events can cause confusion on the part of dependent systems. Sometimes events get lost even with the Replay pattern. For example, say we buffer events at the source for an extra two minutes so we can republish on demand. If the processing application encounters a failure, and it takes more than two minutes to start up, the source might have dropped events to accommodate new events. Therefore, we recommend using the Replay Pattern when consuming from event sources that can store events for a longer time period. We also recommend using the Replay Pattern in conjunction with the Periodic Snapshot State Persistence pattern, especially when the processing application needs to persist state. 389

Answer 39

Use threads or similar methods to persist state, ensuring that the snapshot process does not block the processing of more events. Page 396 ##Footnote When performing snapshots and persisting them in a data store, we recommend using threads, or something similar, so persisting state doesn’t block the processing of more events. 396

Answer 40

To achieve scalability, replace each microservice in the Sequential Convoy pattern with primary and secondary pairs, ensuring that the second microservice is used only as a failover and not for processing more data. Page 400 ##Footnote Though we are using two nodes, we will not be able to process more data with the Periodic Snapshot State Persistence Pattern, as the second microservice is used only as a failover. If scalability is needed, we must use the Sequential Convoy pattern and replace each microservice in that pattern with primary and secondary pairs. 400

Answer 41

Simple patterns such as Transformation and Filters and Thresholds can be implemented within a microservice. More complex patterns should use a stream-processing technology for better outcomes. Page 402 ##Footnote Stream processing can be designed as part of a microservice, or deployed entirely by using a stream-processing system. When it comes to simple patterns such as Transformation, and Filters and Threshold, we can implement the logic within our microservice. But for more-complex patterns, using a stream-processing technology will provide better outcomes. 402

Answer 42

Key aspects include testing the ability to handle state, running chaos testing for reliability patterns, and using the Watermark pattern for deterministic state assertions. For time-bounded patterns, process events based on event timestamps to eliminate network and application latency. Page 407 ##Footnote In this section, we cover the most important aspects of testing stream-processing cloud native applications. When testing stream-processing applications, we need to follow conventional approaches of writing unit and integration tests. Because stream-processing applications are asynchronous, we recommend you follow all testing suggestions provided for event-driven architecture in “Testing”, along with the suggestions provided here. One of the key aspects that we need to test in stream-processing applications is their ability to handle state. When we use reliability patterns, we should run chaos testing to test whether the application is continuously producing correct results despite system failures. To assert the application state deterministically, we can use the Watermark pattern to publish watermark events at the end of each test for the application to persist its state and to ensure that it has completed event processing. When testing time-bounded patterns such as Windowed Aggregations or Temporal Event Ordering patterns, the application could produce different results for each test cycle, due to fluctuations of network and application latency. Instead of asserting the results with a margin of error, we recommend updating the stream-processing applications to process events based on event timestamps generated at the source. With this approach, we can eliminate the network and intermediate application latency and generate reproducible results. 407

Answer 43

Security can be enforced by connecting to message brokers and other systems via secured protocols, encrypting data in transit and at rest, and using a bounded context fronted by an API or secured message broker. Page 408 ##Footnote How can we enforce security for cloud native stream-processing applications? As discussed in Chapter 5, stream-processing applications can enforce security by connecting to message brokers and other systems via secured protocols, and using data and encrypting data in transit and at rest. If enforcing security at the application level is not possible, we recommend using a bounded context, fronted by an API or a secured message broker to consume the events, and build the whole stream-processing system within that context. We also recommend you apply all the general security best practices discussed in Chapters 2 and 5. 408

Answer 44

Monitoring memory consumption is critical because storing events in memory during event spikes can cause the system to run out of memory and fail. Monitoring and load-shedding mechanisms, along with partitioning or remodeling the stream-processing pipeline, can mitigate risks. Page 409 ##Footnote Because stream-processing applications contain state, monitoring their memory consumption is critical. Say a system is running a time-bounded query, such as aggregating values over a five-minute window. If an event spike occurs as the system is storing events in memory, the system could run out of memory, resulting in a failure. Having monitoring and load-shedding mechanisms can help mitigate the risk of abnormal conditions. But if the spike is consistent, we should consider partitioning or remodeling the stream-processing pipeline to cater to higher loads. 409

Answer 45

This pattern allows stateful stream-processing systems to recover their state by monitoring that snapshots are properly saved and garbage-collected once outdated. Monitoring snapshot size, write time, network bandwidth, CPU, and memory is crucial for reliable state storage and restoration. Page 409 ##Footnote Stateful stream-processing systems can utilize the Periodic Snapshot State Persistence pattern to recover their state after failure. Monitoring that these snapshots are properly saved and garbage-collected once outdated is critical. Furthermore, if the application state is large (for example, 500 MB or more), writing the snapshot takes a long time and negatively impacts the available network bandwidth for new events. Therefore, monitoring the size of the snapshots and the time taken to write them, as well as monitoring the network bandwidth, CPU, and memory of the applications when storing the snapshots, will help you properly architect the application to reliably store and restore its state. 409

Answer 46

Monitoring event timestamps helps identify discrepancies caused by network latency and complex stream-processing logic, ensuring accurate synchronization using patterns like Buffered Event Ordering, Course Correction, or Watermark. Page 409 ##Footnote The complexity of the stream-processing logic, and network latency, can make events in one stream arrive later than events in other streams. This can cause errors in the final output. Therefore, monitoring what event timestamp is being processed at each streaming application can help identify such discrepancies. We can apply the Buffered Event Ordering, Course Correction, or Watermark pattern to synchronize the applications and mitigate the error. 409

Answer 47

The first step is selecting the appropriate reliability pattern based on the system's availability and scalability requirements to achieve reliable stream processing. Page 410 ##Footnote In this section, we focus on how DevOps applies to stateful applications such as stream-processing applications. The first DevOps step is selecting the appropriate reliability pattern for achieving reliable stream processing. This should be influenced by the system’s availability and scalability requirements. 410

Answer 48

Select persistence stores that enable rapid state persistence and restoration, store state durably, and empirically determine the optimal snapshot size and frequency to minimize latency. Monitor and garbage-collect redundant snapshots, and ensure snapshots are encrypted for security. Page 410 ##Footnote When a pattern is identified, an appropriate persistence store should be selected to enable the rapid persistence and restoration of state, and store the state durably. We also need to empirically determine the optimal snapshot size and frequency. This should minimize the latency introduced by snapshotting. As part of the DevOps process, we should also monitor and garbage-collect redundant snapshots. Encrypting the snapshots so the sensitive data remains secure is also important. 410

Answer 49

Encrypt events, purge events and snapshots as soon as they are no longer needed, and use a bounded context to protect applications from external threats via an API or a message broker. Page 410 ##Footnote As discussed in Chapter 5, we recommend encrypting events when we are dealing with sensitive data, and purging events and snapshots as soon as they are no longer needed for processing. We also recommend using a bounded context when possible, so we can protect all the applications, via API or a topic in the message broker, from external threats. 410

Answer 50

Observability and monitoring using distributed tracing, logging, and monitoring systems are important due to the difficulty in troubleshooting failures in asynchronous applications. Continuous delivery and backward compatibility of event schema and snapshots ensure smooth deployments. Page 410 ##Footnote Because failures in asynchronous applications are difficult to troubleshoot, we also recommend setting up observability and monitoring by using distributed tracing, logging, and monitoring systems. In addition, continuous delivery is crucial for modern DevOps. For smooth deployments, we recommend maintaining backward compatibility of event schema and snapshots at all times. When major changes occur, we recommend running both versions of the applications in parallel until the new system rebuilds its state. Finally, we recommend using multiple deployment environments, such as development and staging/preproduction, to reduce the impact of changes to the application, and to validate the application before moving it to production. 410

Chapter 6, Stream-Processing Patterns gpt Flashcards

(74 cards)