Chapter 4, Data Management Patterns Flashcards

1
Q

What are Data Sources?

A

Here, data sources are cloud native applications that feed data such as user inputs and sensor readings. They sometimes feed data into data-ingestion systems such as message brokers or, when possible, directly write to data stores. Data-ingestion systems can transfer data as events/messages to other applications or data stores;

160 Figure 4-1. Data architecture for cloud native applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do Batch-processing systems so?

A

Batch-processing systems process data from data sources in batches, and write the processed output back to the data stores so it can be used for reporting or exposed via APIs.

161 Figure 4-1. Data architecture for cloud native applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three main types of data that influence Application behavior?

A
  • Input data
    Sent as part of the input message by the user or client. Most commonly, this data is either JSON or XML messages, though binary formats such as gRPC and Thrift are getting some traction.
  • Configuration data
    Provided by the environment as variables. XML has been used as the configuration language for a long time, and now YAML configs have become the de facto standard for cloud native applications.
  • State data
    The data stored by the application itself, regarding its status, based on all messages and events that occurred before the current time. By persisting the state data and loading it on startup, the application will be able to seamlessly resume its functionality upon restart.

162

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three categories of data that Cloud native applications use?

A
  • Structured data
    Can fit a predefined schema. For example, the data on a typical user registration form can be comfortably stored in a relational database.
  • Semi-structured data
    Has some form of structure. For example, each field in a data entry may have a corresponding key or name that we can use to refer to it, but when we take all the entries, there is no guarantee that each entry will have the same number of fields or even common keys. This data can be easily represented through JSON, XML, and YAML formats.
  • Unstructured data
    Does not contain any meaningful fields. Images, videos, and raw text content are examples. Usually, this data is stored without any understanding of its content.

164

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are ACID properties?

A
  • Atomicity
  • Consistency
  • Isolation
  • Durability

165

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Atomicity from ACID

A

atomicity guarantees that all operations within a transaction are executed as a single unit

165

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Consistency from ACID

A

consistency ensures that the data is consistent before and after the transaction

165

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Isolation from ACID

A

Isolation makes the intermediate state of a transaction invisible to other transactions

165

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define Durability from ACID

A

Durability guarantees that after a successful transaction, the data is persistent even in the event of a system failure

165

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the CAP in CAP theorem stands for?

A

CAP stands for consistency, availability, and partition tolerance. This theorem states that a distributed application can provide either full availability or consistency; we cannot achieve both while providing network partition tolerance. Here, availability means that the system is fully functional when some of its nodes are down, consistency means an update/change in one node is immediately propagated to other nodes, and partition tolerance means that the system can continue to work even when some nodes cannot connect to each other.

169

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are three types of data store?

A
  • Relational
  • NoSQL
  • Filesystem

172

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the three techniques in which data can be managed?

A
  • Centralized
  • Decentralized
  • Hybrid

172

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the Data Service Pattern

A

The Data Service pattern exposes data in the database as a service, referred to as a data service. The data service becomes the owner, responsible for adding and removing data from the data store. The service may perform simple lookups or even encapsulate complex operations when constructing responses for data requests.

180

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How id the Data Service Pattern used?

A

This pattern can be used when we need to allow access to data that does not belong to a single microservice, or when we need to abstract legacy/proprietary data stores to other cloud native applications.

181

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some related patterns to the Data Service pattern?

A
  • Caching pattern
    Provides an opportunity to optimize the efficiency of data retrieval by using local or distributed caching when exposing data via a service.
  • Performance optimization patterns
    Apart from caching data, these execute complex queries such as table joins and running stored procedures directly in the database to improve performance.
  • Materialized View pattern
    Accessing data via an API can still be performance-intensive. For use cases that need joins to be performed with data that resides in stores belonging to other services, having that data replicated in its local store and building a materialized view can help improve query performance.
  • Vault Key pattern
    Along with API security, knowing who is accessing the data can help identify the caller and enforce adequate security and data protection.
    183
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the Composite Data Services Pattern

A

The Composite Data Services pattern performs data composition by combining data from more than one data service and, when needed, performs fairly complex aggregation to provide a richer and more concise response. This pattern is also called the Server-Side Mashup pattern, as data composition happens at the service and not at the data consumer.

185

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does the Composite Data Services Pattern work?

A

The Composite Data Services Pattern combines data from various services and its own data store into one composite data service. This pattern not only eliminates the need for multiple microservices to perform data composition operations, but also allows the combined data to be cached for improving performance (Figure 4-11).

185 Figure 4-11. Composite Data Services pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How is the Composite Data Services Pattern used in practice?

A

This pattern can be used when we need to eliminate multiple microservices repeating the same data composition. Data services that are fine-grained force clients to query multiple services to build their desired data. We can use this pattern to reduce duplicate work done by the clients and consolidate it into a common service.

187

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are some considerations when using the Composite Data Services Pattern?

A

Use this pattern only when the consolidation is generic enough and other microservices will be able to reuse the consolidated data. We do not recommend introducing unnecessary layers of services if they do not provide meaningful data compositions that can be reused. Weigh the benefits of reusability and simplicity of the clients against the additional latency and management complexity added by the service layers.

187

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are some patterns related to The Composite Data Services pattern?

A
  • Caching pattern
    `Provides an opportunity to optimize the efficiency of data retrieval and helps achieve resiliency by serving data from the cache when backends are not available.
  • Client-Side Mashup pattern
    `Allows the data mashup to happen at the client side, such as in the user’s browser. This can be a good solution when asynchronous data loading is feasible and when meaningful data composition can be performed with partial data.

187

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Describe the Client-Side Mashup Pattern

A

In the Client-Side Mashup pattern, data is retrieved from various services and consolidated at the client side. The client is usually a browser loading data via asynchronous Ajax calls.

188

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How does the Client-Side Mashup Pattern work?

A

This pattern utilizes asynchronous data loading, as shown in Figure 4-12. For example, when a browser using this pattern is loading a web page, it loads and renders part of the web page first, while loading the rest of the web page. This pattern uses client-side scripts such as JavaScript to asynchronously load the content in the web browser.
Rather than letting the user wait for a longer time by loading all content on the website at once, this pattern uses multiple asynchronous calls to fetch different parts of the website and renders each fragment when it arrives. These applications are also referred to as rich internet applications (RIAs).

188 Figure 4-12. Client-Side Mashup at a web browser

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How is the Client-Side Mashup Pattern used in practice?

A

This pattern can be used when we need to present available data as soon as possible, while providing more detail later, or when we want to give a perception that the web page is loading much faster.

190

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are some considerations for the Client-Side Mashup Pattern?

A

Use this pattern only when the partial data loaded first can be presented to the user or used in a meaningful way. We do not advise using this pattern when the retrieved data needs to be combined and transformed with later data via some sort of a join before it can be presented to the user.

191

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are some of the related patterns to the Client-Side Mashup Pattern?

A
  • Composite Data Services pattern
    This is useful when content needs to be mashed synchronously and the composite data is common enough to be used by multiple services.
  • Caching pattern
    Provides an opportunity to cache data to improve the overall latency.

191

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When to use the Data Service pattern?

A

Data is not owned by a single microservice, yet multiple microservices are depending on the data for their operation.

192

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

When not to use the Data Service pattern?

A

Data can clearly be associated with an existing microservice, as introducing unnecessary microservices can also cause management complexity.

192

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the benefits of using the Data Service pattern?

A

Reduces the coupling between services.
Provides more control/security on the operations that can be performed on the shared data.

192

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

When to use the Composite Data Services pattern?

A

Many clients query multiple services to consolidate their desired data, and this consolidation is generic enough to be reused among the clients.

192

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

When not to use the Composite Data Services pattern?

A

Only one client needs the consolidation.
Operations performed by clients cannot be generalized to be reused by many clients.

192

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are the benefits of using the Composite Data Services pattern?

A

Reduces duplicate work done by the clients and consolidates it into a common service.
Provides more data resiliency by using caches or static data.

192

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

When to use the Client-Side Mashup pattern?

A

Some meaningful operations can be performed with partial data; for example, rendering nondependent data in web browsers.

192

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

When not to use the Client-Side Mashup pattern?

A

Processing, such as a join, is required on the independently retrieved data before sending the response.

192

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What are the benefits of using the Client-Side Mashup pattern?

A

Results in more-responsive applications.
Reduces the wait time.

192

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Describe the Data Sharding Pattern

A

In the Data Sharding pattern, the data store is divided into shards, which allows it to be easily stored and retrieved at scale. The data is partitioned by one or more of its attributes so we can easily identify the shard in which it resides.

193

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

In what ways can you shard data?

A

To shard the data, we can use horizontal, vertical, or functional approaches. Let’s look at these three options in detail:

193

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Describe Horizontal data sharding

A

Each shard has the same schema, but contains distinct data records based on its sharding key. A table in a database is split across multiple nodes based on these sharding keys. For example, user orders can be shared by hashing the order ID into three shards, as depicted in Figure 4-13.

193 Figure 4-13. Horizontal data sharding using hashing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Describe Vertical data sharding

A

Each shard does not need to have an identical schema and can contain various data fields. Each shard can contain a set of tables that do not need to be in another shard. This is useful when we need to partition the data based on the frequency of data access; we can put the most frequently accessed data in one shard and move the rest into a different shard. Figure 4-14 depicts how frequently accessed user data is sharded from the other data.

194 Figure 4-14. Vertical data sharding based on frequency of data access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Describe Functional data sharding

A

Data is partitioned by functional use cases. Rather than keeping all the data together, the data can be segregated in different shards based on different functionalities. This also aligns with the process of segregating functions into separate functional services in the cloud native application architecture. Figure 4-15 shows how product details and reviews are sharded into two data stores.

196 Figure 4-15. Functional data sharding by segregating product details and reviews into two data stores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

When using horizontal data sharding, what are the techniques we can deploy to locate where we have stored data?

A
  • Lookup-based data sharding
  • Range-based data sharding
  • Hash-based data sharding

197

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Describe Lookup-based data sharding

A

A lookup service or distributed cache is used to store the mapping of the shard key and the actual location of the physical data. When retrieving the data, the client application will first check the lookup service to resolve the actual physical location for the intended shard key, and then access the data from that location. If the data gets rebalanced or resharded later, the client has to again look up the updated data location.

197

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Describe Range-based data sharding

A

This special type of sharding approach can be applied when the sharding key has sequential characters. The data is shared in ranges, and as in lookup-based sharding, a lookup service can be used to determine where the given data range is available. This approach yields the best results for sharding keys based on date and time. A data range of a month, for example, may reside in the same shard, allowing the service to retrieve all the data in one go, rather than querying multiple shards.

197

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Describe Hash-based data sharding

A

Constructing a shard key based on the data fields or dividing the data by date range may not always result in balanced shards. At times we need to distribute the data randomly to generate better-balanced shards. This can be done by using hash-based data sharding, which creates hashes based on the shard key and uses them to determine the shard data location. This approach is not the best when data is queried in ranges, but is ideal when individual records are queried. Here, we can also use a lookup service to store the hash key and the shard location mapping, to facilitate data loading.

197

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How is the Data Sharding Pattern used in practice?

A

This pattern can be used when we can no longer store data in a single node, or when we need data to be distributed so we can access it with lower latency.

198

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are some patterns that are related to the Data Sharding Pattern?

A
  • Materialized View pattern
    This can be used to replicate the dependent data of each shard to the local stores of the service, to improve data-querying performance and eliminate multiple lookup calls to data stores or services. This data can be replicated with only eventual consistency, so this approach is useful only if consistency on the dependent data is not business-critical for the applications.
  • Data Locality pattern
    Having all the relevant data at the shard will allow the creation of indexes and execution of stored procedures for efficient data retrieval.

202

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Describe the Command and Query Responsibility Segregation Pattern

A

The Command and Query Responsibility Segregation (CQRS) pattern separates updates and query operations of a data set, and allows them to run on different data stores. This results in faster data update and retrieval. It also facilitates modeling data to handle multiple use cases, achieves high scalability and security, and allows update and query models to evolve independently with minimal interactions.

202

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

How is the Command and Query Responsibility Segregation Pattern used in practice?

A

We can use this pattern when we want to use different domain models for commands and queries, and when we need to separate updates and data retrieval for performance and security reasons.

204

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What are some related patterns to the Command and Query Responsibility Segregation Pattern?

A
  • Event Sourcing pattern
    Allows command services to communicate updates to query services, and allows both command and query models to reside on different data stores. This provides only eventual consistency between command and query models and adds complexity to the system architecture. Chapter 5 covers this pattern in detail.
  • Materialized View pattern
    Recommended over the CQRS pattern to achieve scalability, when command and query models are simple enough; Materialized View is covered in the next section.
  • Data Sharding pattern
    Helps scale commands by partitioning the data (as covered previously in this chapter). As query operations can simply be replicated, applying this pattern for queries may not produce any performance benefit.
    API security
    Can be applied to enforce security for both command and query services.

206

49
Q

When to use the Data Sharding pattern?

A

Data contains one or a collection of fields that uniquely identify the data or meaningfully group the data into subsets.

207

50
Q

When not to use the Data Sharding pattern?

A

Shard key cannot produce evenly balanced shards.
The operations performed in the data require the whole set of data to be processed; for example, obtaining a median from the data set.

207

51
Q

Benefits of using the Data Sharding pattern

A

Groups shards based on the preferred set of fields that produce the shard key.
Creates geographically optimized shards that can be moved closer to the clients.
Builds hierarchical shards or time-range-based shards to optimize the search time.
Uses secondary indexes to query data by using nonshard keys.

207

52
Q

When to use the Command and Query Responsibility Segregation (CQRS) pattern?

A

Applications have performance-intensive update operations with:

  • Data validations
  • Security validations
  • Message transformations

For performance-intensive query operations such as complex joins or data mapping.

207

53
Q

When not to use the Command and Query Responsibility Segregation (CQRS) pattern?

A

High consistency is required between command (update) and query (read).
Command and query models are closer to each other.

207

54
Q

Benefits of using the Command and Query Responsibility Segregation (CQRS) pattern

A

Reduces the impact between command and query operations.
Stores command and query data in two different data stores that suit their use cases.
Enforces separated command/query security policies.
Enables different teams to own applications that are responsible for command and query operations.
Provides high availability.

207

55
Q

Describe the Materialized View Pattern

A

The Materialized View pattern provides the ability to retrieve data efficiently upon querying, by moving data closer to the execution and prepopulating materialized views. This pattern stores all relevant data of a service in its local data store and formats the data optimally to serve the queries, rather than letting that service call dependent services for data when required.

209

56
Q

How is the Materialized View Pattern used in practice?

A

We can use this pattern when we want to improve data-retrieval efficiency by eliminating complex joins and to reduce coupling with dependent services.

211

57
Q

What are some patterns that are related to the Materialized View Pattern?

A
  • Data Locality pattern
    Enables efficient data retrieval by moving the execution closer to the data.
  • Composite Data Services pattern
    This can be used instead of the Materialized View pattern when data compositions can be done at the service level, or when dependent services have static data that can be cached locally at the service.
  • Command and Query Responsibility Segregation (CQRS) pattern
    The Materialized View pattern can be used to serve query responses in the CQRS pattern. The command—the modifications to the data—will be done through the dependent service, and the query—the serving of the read requests—can be performed by query services constructing the materialized views.
  • Event Sourcing pattern
    Provides an approach to replicate data from one source to another. Changes on dependent data are pushed as events through event streams, which are stored sequentially at a reliable log-based event queue such as Kafka, and then the services that serve the data read those event streams and constantly update their local storage to serve updated information. Chapter 5 covers this pattern.

213

57
Q

When is the Materialized View Pattern used?

A

This pattern is used when part of the data is available locally and the rest needs to be fetched from external sources that incur high latency.

211

58
Q

Describe the Data Locality Pattern

A

The goal of the Data Locality pattern is to move execution closer to the data. This is done by colocating the services with the data or by performing the execution in the data store itself. This allows the execution to access data with fewer limitations, helping to quicken execution, and to reduce bandwidth by sending aggregated results.

214

59
Q

How is the Data Locality Pattern used in practice?

A

This pattern encourages coupling execution with data to reduce latency and save bandwidth, enabling distributed cloud native applications to operate efficiently over the network.

216

60
Q

What are some related patterns to the Data Locality Pattern?

A
  • Materialized View pattern
    Provides an alternative approach for this pattern, by moving data closer to the place of execution. This pattern is ideal when the data is small or when CPU-intensive operations such as complex joins and data transformations are needed during reads.
  • Caching pattern
    Complements this pattern by storing preprocessed data and serving it during repeated queries.
    218
61
Q

Define the Caching Pattern

A

The Caching pattern stores previously processed or retrieved data in memory, and serves this data for similar queries issued in the future. This not only reduces repeated data processing at the services, but also eliminates calls to dependent services when the response is already stored in the service.

218

62
Q

How is the Caching Pattern used in practice?

A

This pattern is usually applied when the same query can be repeatedly called multiple times by one or more clients, especially when we don’t have enough knowledge about what data will be queried next.

220

63
Q

What are some related patterns to the Caching Pattern?

A
  • Data Sharding pattern
    Enables the cache to be scaled similarly to the way we can scale data stores. This also enables distributing data geographically so relevant data in the cache can be closer to the services that operate them.
  • Resilient Connectivity pattern
    Provides a mechanism to serve requests from the data sources when data is not available in the cache. Chapter 3 discusses this pattern.
  • Data Service pattern
    Along with API security, can be used to provide a service layer for distributed caches, providing more business-centric APIs for data consumers.
  • Vault Key pattern
    Provides the capability to secure the caches by using access tokens enabling third parties to access the data directly from caches. This can be used only if the caching systems support this functionality. Otherwise, we need to fall back on using the Data Service pattern with API security.
  • Event Sourcing pattern
    Propagates cache-invalidation requests to all local caches. This enables eventual consistency of cache data and reduces the chance of data being obsolete as data sources are updated by multiple services. Chapter 5 details this pattern.

229

64
Q

Describe the Static Content Hosting Pattern

A

The Static Content Hosting pattern deploys static content in data stores that are closer to clients so content can be delivered directly to the client with low latency and without consuming excess computational resources.

230

65
Q

How is the Static Content Hosting Pattern used in practice?

A

This pattern is used when we need to quickly deliver static content to clients with low response time, and when we need to reduce the load on rendering services.

232

66
Q

What are some related patterns to the Static Content Hosting Pattern?

A
  • Data Sharding pattern
    1Can be used to shard data when you have a lot of static data.
  • Caching pattern
    Caches content for faster data access. The cache expiration based on time-out is not necessary, as static data will not become outdated.
  • Vault Key pattern
    Provides security to systems hosting static content.
  • Data Service pattern
    Along with API security, provides a service layer on top of the content to control data access.

233

67
Q

When to use the Materialized View pattern?

A

Part of the data is available locally, and the rest of the data needs to be fetched from external sources that incur high latency.
The data that needs to be moved is small and rarely updated.
Provides access to nonsensitive data that is hosted in secure systems.

234

68
Q

When not to use the Materialized View pattern?

A

Data can be retrieved from dependent services with low latency.
Data in the dependent services is changing quickly.
Consistency of the data is considered important for the response.

234

69
Q

Benefits of using the Materialized View pattern

A

Can store the data in any database that is suitable for the application.
Increases resiliency of the service by replicating the data to local stores.

234

70
Q

When to use the Data Locality pattern?

A

To read data from multiple data sources and perform a join or data aggregation in memory.
The data stores are huge, and the clients are geographically distributed.

234

71
Q

When not to use the Data Locality pattern?

A

Queries output most of their input.
Additional execution cost incurred at the data nodes is higher than the cost of data transfer over the network.

234

72
Q

Benefits of using the Data Locality pattern

A

Reduces network bandwidth utilization and data-retrieval latency.
Better utilizes CPU resources and optimizes overall performance.
Caches results and serves requests more efficiently.

234

73
Q

When to use the Caching pattern?

A

Best for static data or data that is read more frequently than it is updated.
Application has the same query that can be repeatedly called multiple times by one or more clients, especially when we do not have enough knowledge about what data will be queried next.
The data store is subject to a high level of contention or cannot handle the number of concurrent requests it is receiving from multiple clients.

234

74
Q

When not to use the Caching pattern?

A

The data is updated frequently.
As the means of storing state, as it should not be considered as the single source of truth.
The data is critical, and the system cannot tolerate data inconsistencies.

234

75
Q

Benefits of using the Caching pattern

A

Can choose which part of the data to cache to improve performance.
Using a cache aside improves performance by reducing redundant computations.
Can preload static data into the cache.
Combined with eviction policy, the cache can hold the recent/required data.

234

76
Q

When to use the pattern?

A

All or some of the data requested by the client is static.
The static data needs to be available in multiple environments or geographic locations.

234

77
Q

When not to use the pattern?

A

The static content needs to be updated before delivering to the clients, such as adding the access time and location.
The amount of data that needs to be served is small.
Clients cannot retrieve and combine static and dynamic content together.

234

78
Q

Benefits of using the pattern

A

Geographically partitioning and storing closer to clients provides shorter response times and faster access/download speed.
Reduces resource utilization on rendering services.

234

79
Q

Describe the Transaction Pattern

A

The Transaction pattern uses transactions to perform a set of operations as a single unit of work, so all operations are completed or undone as a unit. This helps maintain the integrity of the data, and error-proofs execution of services. This is critical for the successful execution of financial applications.

236

80
Q

How does the Transaction pattern work?

A

This pattern wraps multiple individual operations into a single large operation, providing a guarantee that either all operations or no operation will succeed. All transactions follow these steps:

1 - System initiates a transaction.

2 - Various data manipulation operations are executed.

3 - Commit is used to indicate the end of the transaction.

4 - If there are no errors, the commit will succeed, the transaction will finish successfully, and the changes will be reflected in the data stores.
If there are errors, all the operations in the transaction will be rolled back, and the transaction will fail. No changes will be reflected in the data stores.

236

81
Q

What are the ACID properties?

A
  • Atomic
    All operations must occur at once, or none should occur.
  • Consistent
    Before and after the transaction, the system will be in a valid state.
  • Isolation
    The results produced by concurrent transactions will be identical to such transactions being executed in sequential order.
  • Durable
    When the transaction is finished, the committed changes will remain committed even during system failures.

237

82
Q

How is the Transaction pattern used in practice?

A

Transactions can be used to combine multiple operations as a single unit of work, and to coordinate the operations of multiple systems.

238

83
Q

What are some related patterns to the Transaction pattern?

A

The Transaction pattern has one related pattern, the Saga pattern. This pattern, covered in Chapter 3, reliably coordinates execution of multiple systems.

241

84
Q

When to use the Transaction pattern?

A

An operation contains multiple steps, and all the steps should be processed automatically to consider the operation valid.

241

85
Q

When not to use the Transaction pattern?

A

The application has only a single step in the operation.
The application has multiple steps, and failure of some steps is considered acceptable.

241

86
Q

What are the benefits of using the Transaction pattern?

A

Adheres to ACID properties.
Processes multiple independent transactions.

241

87
Q

Describe the Vault Key Pattern

A

The Vault Key pattern provides direct access to data stores via a trusted token, commonly named the vault key. Some of the popular cloud data stores support this functionality.

242

88
Q

Dow does the Vault Key Pattern work?

A

The Vault Key pattern is based on a trusted token being presented by the client and being validated by the data store. In this pattern, the application determines who can access which part of the data.

242 Figure 4-25. Actions performed by clients to retrieve data in the Vault Key pattern

89
Q

How is the Vault Key Pattern used in practice?

A

This pattern can be used when the data store cannot reach the identity provider to authenticate and authorize the client upon data access. In this pattern, the data store will contain the certificate of the identity provider, so it will be able to decrypt the token and validate its authenticity without calling the identity provider. Because it does not need to make remote service calls for validation, it can also perform authentication operations with minimal latency.

243

90
Q

What are some related patterns to the Vault Key Pattern?

A

The Vault Key pattern has one related pattern, Data Service (covered at the start of this chapter). Along with API security, the Data Service pattern provides an alternative approach for providing security when the Vault Key pattern is not feasible.

244

91
Q

When to use the Vault Key Pattern?

A

To securely access remote data with minimal latency.
The store has a limited computing capability to perform service calls for authentication and authorization.

244

92
Q

When not to use the Vault Key Pattern?

A

Need fine-grained data protection.
Need to restrict what queries should be executed on the data store with high precision.
The exposed data store cannot validate access based on keys.

244

93
Q

What are the benefits of using the Vault Key Pattern?

A

Accesses data stores directly by using a trusted token, a vault key
Has minimal operational costs compared to calling the central identity service for validation

244

94
Q

When to use Relational database management system (RDBMS)?

A
  • Need transactions and ACID properties.
  • Interrelationship with data is required to be maintained.
  • Working with small to medium amounts of data.

252

95
Q

When not to use Relational database management system (RDBMS)?

A
  • Data needs to be highly scalable, such as IoT data.
  • Working with XML, JSON, and binary data format.
  • Solution cannot tolerate some level of unavailability.

252

96
Q

When to use Apache Cassandra?

A
  • Need high availability.
  • Need scalability.
  • Need a decentralized solution.
  • Need faster writes than reads.
  • Read access can be mostly performed by partition key.

252

97
Q

When not to use Apache Cassandra?

A
  • Existing data is updated frequently.
  • Need to access data by columns that are not part of the partition key.
  • Require relational features, such as transactions, complex joins, and ACID properties.

252

98
Q

When to use Apache HBase?

A
  • Need consistency.
  • Need scalability.
  • Need a decentralized solution.
  • Need high read performance.
  • Need both random and real-time access to data.
  • Need to store petabytes of data.

252

99
Q

When not to use Apache HBase?

A
  • Solution cannot tolerate some level of unavailability.
  • Existing data is updated very frequently.
  • Require relational features, such as transactions, complex joins, and ACID properties.

252

100
Q

When to use When to use MongoDB?

A
  • Need consistency.
  • Need a decentralized solution.
  • Need a document store.
  • Need data lookup based on multiple keys.
  • Need high write performance.

252

101
Q

When not to use When to use MongoDB?

A
  • Solution cannot tolerate some level of unavailability.
  • Require relational features, such as transactions, complex joins, and ACID properties.

252

102
Q

When to use Redis?

A
  • Need scalability.
  • Need an in-memory database.
  • Need a persistent option to restore the data.
  • As a cache, queue, and real-time storage.

252

103
Q

When not to use Redis?

A
  • As a typical database to store and query with complex operations.

252

104
Q

When to use Amazon DynamoDB?

A
  • Need a highly scalable solution.
  • Need a document store.
  • Need a key-value store.
  • Need high write performance.
  • Fine-grained access control.

252

105
Q

When not to use Amazon DynamoDB?

A
  • Use in platforms other than AWS.
  • Require relational features, such as complex joins, and foreign keys.

252

106
Q

When to use Apache HDFS?

A
  • Need a filesystem.
  • Store large files.
  • Store data once and reads multiple times.
  • Perform MapReduce operation on files.
  • Need scalability.
  • Need data resiliency.

252

107
Q

When not to use Apache HDFS?

A
  • Store small files.
  • Need to update files.
  • Need to perform random data reads.

252

108
Q

When to use Amazon S3?

A
  • Need an object store.
  • Perform MapReduce operations on objects.
  • Need a highly scalable solution.
  • Read part of the object data.
  • Fine-grained access control.

252

109
Q

When not to use Amazon S3?

A
  • Use in platforms other than AWS.
  • Need to run complex queries.

252

110
Q

When to use Azure Cosmos DB?

A
  • Need a highly scalable solution.
  • Need a document store.
  • Need a key-value store.
  • Need a graph store.
  • Need a column store.
  • Fine-grained access control.
  • Connectivity via MongoDB and Cassandra clients

252

111
Q

When not to use Azure Cosmos DB?

A
  • Use in platforms other than Azure.
  • Perform transaction across data partitions.

252

112
Q

When to use Google Cloud Spanner?

A
  • Need a highly scalable solution.
  • Need a relational store.
  • Need support for SQL query processing
  • Need transaction support across all nodes in the cluster.

252

113
Q

When not to use Google Cloud Spanner?

A
  • Use in platforms other than Google Cloud.
  • Support for full ANSI SQL spec.

252

114
Q

We can use test data stores to test data-service interactions, Though data services can have complex or simple logic, they can still cause bottlenecks in production. What are useful recommendations for overcoming these issues?

A
  • Tests should be performed with both clean and prepopulated data stores, as the former will test for data initialization code and the latter will test for data consistency during operation.
  • Test all data store types and versions that will be used in production to eliminate any surprises. We can implement test data stores as Docker instances that will help run tests in multiple environments with quick startup and proper cleanup after the test.
  • Test data mapping and make sure all fields are properly mapped when calling the data store.
  • Validate whether the service is performing inserts, writes, deletion, and updates on the data stores in an expected manner by checking the state of the data store via test clients that can access the database directly.
  • Validate that relational constraints, triggers, and stored procedures are producing correct results.

In addition, it is important to do a load test on the data service along with the data store in a production-like environment with multiple clients. This will help identify any database lock, data consistency, or other performance-related bottlenecks present in the cloud native application. It will also show how much load the application can handle and how that will be affected when various data scaling patterns and techniques are deployed.

254

115
Q

Describe Observability and Monitoring

A

Observability and monitoring help us identify the performance of data stores and take corrective actions when they deviate because of load or changes to the application. In most applications, incoming requests interact with the data stores. Any performance or availability issues in the data store will resonate across all layers of the system, affecting the overall user experience.

257

116
Q

What are some key metrics to observe in a data store?

A
  • Application metrics– Data store uptime/health: To identify whether each node in the data store is up and running.– Query execution time: Five types of issues can cause high query execution times:
      --- Inefficient query: Use of nonoptimized queries including multiple complex joins, and tables not being indexed properly.
    
      --- Data growth in the data store: Data stores containing more data than it can handle.
    
      --- Concurrency: Concurrent operations on the same table/row, locking data stores and impacting their performance.
    
      --- Lack of system resources such as CPU/memory/disk space: Data store nodes not having enough resources to efficiently serve the request.
    
      --- Unavailability of dependent system or replica: In distributed data stores, when its replica or other dependent systems such as a lookup service is not available, it may take more time as it needs to provision a new instance or discover and route the request to another instance.
    – Query execution response: Whether the query execution is successful. If the query is failing, we may need to look at the logs for more detail (depending on the failure).– Audit of the query operations: Malicious queries or user operations can result in unexpected reduction in data store performance. We can use audit logs to identify and mitigate them.
  • System metrics: To identify a lack of system resources for efficient processing via CPU consumption, memory consumption, availability of disk space, network utilization, and disk I/O speed.
  • Data store logs
  • Time taken and throughput when communicating with primary and replicas: Helps to understand networking issues and bad data store

When analyzing metrics, we can use percentiles to compare historical and current behaviors. This can identify anomalies and deviations, so we can quickly identify the root cause of the problem.

257

117
Q

What are some steps and key considerations for deploying and managing data stores?

A

1 - Select data store types. Select the data store type (relational, NoSQL, or filesystem) and its vendor to match our use case.

2 - Configure the deployment pattern. This can be influenced by the patterns applied in the cloud native application and the type of data store we have selected. Based on this selection, high availability and scalability should be determined by answering the following questions:

1 - Who are the clients?

2 - How many nodes?

3 - Are we going to use a data store managed by the cloud vendor or deploy our own?

4 - How does the replication work?

5 - How do we back up the data?

6 - How does it handle disaster recovery?

7 - How do we secure the data store?

8 - How do we monitor the data store?

9 - How much does the data store/management cost?

3 - Enforce security. Data stores should be protected because they contain business-critical information. This can be enforced by applying relevant physical and software security as discussed in the preceding section. This may include enabling strict access control, data encryption, and use of audit logs.

4 - Set up observability and monitoring. Like microservices, data stores should be configured with observability and monitoring tools to guarantee continuous operation. This can provide early insights on possible scaling problems, such as a requirement to rebalance data shards, or to apply a different design pattern altogether to improve scalability and performance of the application.

5 - Automate continuous delivery. When it comes to data stores, automation and continuous delivery are not straightforward. Although we can easily come up with an initial data store schema, maintaining backward compatibility is difficult as the application evolves. Backward compatibility is critical; without it, we will not be able to achieve smooth application updates and rollbacks during failures. To improve productivity, we should always use proper automation tools such as scripts to automate continuous delivery. We also recommend having guardrails and using multiple deployment environments, such as development, and staging/preproduction, to reduce the impact of the changes and to validate the application before moving it to production.

By following these steps, we can safely deploy and maintain cloud native applications while allowing rapid innovation and adoption to other systems.

260