Chapter 4, Data Management Patterns Flashcards
What are Data Sources?
Here, data sources are cloud native applications that feed data such as user inputs and sensor readings. They sometimes feed data into data-ingestion systems such as message brokers or, when possible, directly write to data stores. Data-ingestion systems can transfer data as events/messages to other applications or data stores;
160 Figure 4-1. Data architecture for cloud native applications
What do Batch-processing systems so?
Batch-processing systems process data from data sources in batches, and write the processed output back to the data stores so it can be used for reporting or exposed via APIs.
161 Figure 4-1. Data architecture for cloud native applications
What are the three main types of data that influence Application behavior?
- Input data
Sent as part of the input message by the user or client. Most commonly, this data is either JSON or XML messages, though binary formats such as gRPC and Thrift are getting some traction. - Configuration data
Provided by the environment as variables. XML has been used as the configuration language for a long time, and now YAML configs have become the de facto standard for cloud native applications. - State data
The data stored by the application itself, regarding its status, based on all messages and events that occurred before the current time. By persisting the state data and loading it on startup, the application will be able to seamlessly resume its functionality upon restart.
162
What are the three categories of data that Cloud native applications use?
- Structured data
Can fit a predefined schema. For example, the data on a typical user registration form can be comfortably stored in a relational database. - Semi-structured data
Has some form of structure. For example, each field in a data entry may have a corresponding key or name that we can use to refer to it, but when we take all the entries, there is no guarantee that each entry will have the same number of fields or even common keys. This data can be easily represented through JSON, XML, and YAML formats. - Unstructured data
Does not contain any meaningful fields. Images, videos, and raw text content are examples. Usually, this data is stored without any understanding of its content.
164
What are ACID properties?
- Atomicity
- Consistency
- Isolation
- Durability
165
Define Atomicity from ACID
atomicity guarantees that all operations within a transaction are executed as a single unit
165
Define Consistency from ACID
consistency ensures that the data is consistent before and after the transaction
165
Define Isolation from ACID
Isolation makes the intermediate state of a transaction invisible to other transactions
165
Define Durability from ACID
Durability guarantees that after a successful transaction, the data is persistent even in the event of a system failure
165
What does the CAP in CAP theorem stands for?
CAP stands for consistency, availability, and partition tolerance. This theorem states that a distributed application can provide either full availability or consistency; we cannot achieve both while providing network partition tolerance. Here, availability means that the system is fully functional when some of its nodes are down, consistency means an update/change in one node is immediately propagated to other nodes, and partition tolerance means that the system can continue to work even when some nodes cannot connect to each other.
169
What are three types of data store?
- Relational
- NoSQL
- Filesystem
172
What are the three techniques in which data can be managed?
- Centralized
- Decentralized
- Hybrid
172
Describe the Data Service Pattern
The Data Service pattern exposes data in the database as a service, referred to as a data service. The data service becomes the owner, responsible for adding and removing data from the data store. The service may perform simple lookups or even encapsulate complex operations when constructing responses for data requests.
180
How id the Data Service Pattern used?
This pattern can be used when we need to allow access to data that does not belong to a single microservice, or when we need to abstract legacy/proprietary data stores to other cloud native applications.
181
What are some related patterns to the Data Service pattern?
- Caching pattern
Provides an opportunity to optimize the efficiency of data retrieval by using local or distributed caching when exposing data via a service. - Performance optimization patterns
Apart from caching data, these execute complex queries such as table joins and running stored procedures directly in the database to improve performance. - Materialized View pattern
Accessing data via an API can still be performance-intensive. For use cases that need joins to be performed with data that resides in stores belonging to other services, having that data replicated in its local store and building a materialized view can help improve query performance. - Vault Key pattern
Along with API security, knowing who is accessing the data can help identify the caller and enforce adequate security and data protection.
183
Describe the Composite Data Services Pattern
The Composite Data Services pattern performs data composition by combining data from more than one data service and, when needed, performs fairly complex aggregation to provide a richer and more concise response. This pattern is also called the Server-Side Mashup pattern, as data composition happens at the service and not at the data consumer.
185
How does the Composite Data Services Pattern work?
The Composite Data Services Pattern combines data from various services and its own data store into one composite data service. This pattern not only eliminates the need for multiple microservices to perform data composition operations, but also allows the combined data to be cached for improving performance (Figure 4-11).
185 Figure 4-11. Composite Data Services pattern
How is the Composite Data Services Pattern used in practice?
This pattern can be used when we need to eliminate multiple microservices repeating the same data composition. Data services that are fine-grained force clients to query multiple services to build their desired data. We can use this pattern to reduce duplicate work done by the clients and consolidate it into a common service.
187
What are some considerations when using the Composite Data Services Pattern?
Use this pattern only when the consolidation is generic enough and other microservices will be able to reuse the consolidated data. We do not recommend introducing unnecessary layers of services if they do not provide meaningful data compositions that can be reused. Weigh the benefits of reusability and simplicity of the clients against the additional latency and management complexity added by the service layers.
187
What are some patterns related to The Composite Data Services pattern?
- Caching pattern
`Provides an opportunity to optimize the efficiency of data retrieval and helps achieve resiliency by serving data from the cache when backends are not available. - Client-Side Mashup pattern
`Allows the data mashup to happen at the client side, such as in the user’s browser. This can be a good solution when asynchronous data loading is feasible and when meaningful data composition can be performed with partial data.
187
Describe the Client-Side Mashup Pattern
In the Client-Side Mashup pattern, data is retrieved from various services and consolidated at the client side. The client is usually a browser loading data via asynchronous Ajax calls.
188
How does the Client-Side Mashup Pattern work?
This pattern utilizes asynchronous data loading, as shown in Figure 4-12. For example, when a browser using this pattern is loading a web page, it loads and renders part of the web page first, while loading the rest of the web page. This pattern uses client-side scripts such as JavaScript to asynchronously load the content in the web browser.
Rather than letting the user wait for a longer time by loading all content on the website at once, this pattern uses multiple asynchronous calls to fetch different parts of the website and renders each fragment when it arrives. These applications are also referred to as rich internet applications (RIAs).
188 Figure 4-12. Client-Side Mashup at a web browser
How is the Client-Side Mashup Pattern used in practice?
This pattern can be used when we need to present available data as soon as possible, while providing more detail later, or when we want to give a perception that the web page is loading much faster.
190
What are some considerations for the Client-Side Mashup Pattern?
Use this pattern only when the partial data loaded first can be presented to the user or used in a meaningful way. We do not advise using this pattern when the retrieved data needs to be combined and transformed with later data via some sort of a join before it can be presented to the user.
191
What are some of the related patterns to the Client-Side Mashup Pattern?
- Composite Data Services pattern
This is useful when content needs to be mashed synchronously and the composite data is common enough to be used by multiple services. - Caching pattern
Provides an opportunity to cache data to improve the overall latency.
191
When to use the Data Service pattern?
Data is not owned by a single microservice, yet multiple microservices are depending on the data for their operation.
192
When not to use the Data Service pattern?
Data can clearly be associated with an existing microservice, as introducing unnecessary microservices can also cause management complexity.
192
What are the benefits of using the Data Service pattern?
Reduces the coupling between services.
Provides more control/security on the operations that can be performed on the shared data.
192
When to use the Composite Data Services pattern?
Many clients query multiple services to consolidate their desired data, and this consolidation is generic enough to be reused among the clients.
192
When not to use the Composite Data Services pattern?
Only one client needs the consolidation.
Operations performed by clients cannot be generalized to be reused by many clients.
192
What are the benefits of using the Composite Data Services pattern?
Reduces duplicate work done by the clients and consolidates it into a common service.
Provides more data resiliency by using caches or static data.
192
When to use the Client-Side Mashup pattern?
Some meaningful operations can be performed with partial data; for example, rendering nondependent data in web browsers.
192
When not to use the Client-Side Mashup pattern?
Processing, such as a join, is required on the independently retrieved data before sending the response.
192
What are the benefits of using the Client-Side Mashup pattern?
Results in more-responsive applications.
Reduces the wait time.
192
Describe the Data Sharding Pattern
In the Data Sharding pattern, the data store is divided into shards, which allows it to be easily stored and retrieved at scale. The data is partitioned by one or more of its attributes so we can easily identify the shard in which it resides.
193
In what ways can you shard data?
To shard the data, we can use horizontal, vertical, or functional approaches. Let’s look at these three options in detail:
193
Describe Horizontal data sharding
Each shard has the same schema, but contains distinct data records based on its sharding key. A table in a database is split across multiple nodes based on these sharding keys. For example, user orders can be shared by hashing the order ID into three shards, as depicted in Figure 4-13.
193 Figure 4-13. Horizontal data sharding using hashing
Describe Vertical data sharding
Each shard does not need to have an identical schema and can contain various data fields. Each shard can contain a set of tables that do not need to be in another shard. This is useful when we need to partition the data based on the frequency of data access; we can put the most frequently accessed data in one shard and move the rest into a different shard. Figure 4-14 depicts how frequently accessed user data is sharded from the other data.
194 Figure 4-14. Vertical data sharding based on frequency of data access
Describe Functional data sharding
Data is partitioned by functional use cases. Rather than keeping all the data together, the data can be segregated in different shards based on different functionalities. This also aligns with the process of segregating functions into separate functional services in the cloud native application architecture. Figure 4-15 shows how product details and reviews are sharded into two data stores.
196 Figure 4-15. Functional data sharding by segregating product details and reviews into two data stores
When using horizontal data sharding, what are the techniques we can deploy to locate where we have stored data?
- Lookup-based data sharding
- Range-based data sharding
- Hash-based data sharding
197
Describe Lookup-based data sharding
A lookup service or distributed cache is used to store the mapping of the shard key and the actual location of the physical data. When retrieving the data, the client application will first check the lookup service to resolve the actual physical location for the intended shard key, and then access the data from that location. If the data gets rebalanced or resharded later, the client has to again look up the updated data location.
197
Describe Range-based data sharding
This special type of sharding approach can be applied when the sharding key has sequential characters. The data is shared in ranges, and as in lookup-based sharding, a lookup service can be used to determine where the given data range is available. This approach yields the best results for sharding keys based on date and time. A data range of a month, for example, may reside in the same shard, allowing the service to retrieve all the data in one go, rather than querying multiple shards.
197
Describe Hash-based data sharding
Constructing a shard key based on the data fields or dividing the data by date range may not always result in balanced shards. At times we need to distribute the data randomly to generate better-balanced shards. This can be done by using hash-based data sharding, which creates hashes based on the shard key and uses them to determine the shard data location. This approach is not the best when data is queried in ranges, but is ideal when individual records are queried. Here, we can also use a lookup service to store the hash key and the shard location mapping, to facilitate data loading.
197
How is the Data Sharding Pattern used in practice?
This pattern can be used when we can no longer store data in a single node, or when we need data to be distributed so we can access it with lower latency.
198
What are some patterns that are related to the Data Sharding Pattern?
- Materialized View pattern
This can be used to replicate the dependent data of each shard to the local stores of the service, to improve data-querying performance and eliminate multiple lookup calls to data stores or services. This data can be replicated with only eventual consistency, so this approach is useful only if consistency on the dependent data is not business-critical for the applications. - Data Locality pattern
Having all the relevant data at the shard will allow the creation of indexes and execution of stored procedures for efficient data retrieval.
202
Describe the Command and Query Responsibility Segregation Pattern
The Command and Query Responsibility Segregation (CQRS) pattern separates updates and query operations of a data set, and allows them to run on different data stores. This results in faster data update and retrieval. It also facilitates modeling data to handle multiple use cases, achieves high scalability and security, and allows update and query models to evolve independently with minimal interactions.
202
How is the Command and Query Responsibility Segregation Pattern used in practice?
We can use this pattern when we want to use different domain models for commands and queries, and when we need to separate updates and data retrieval for performance and security reasons.
204