Data management Flashcards

Question 1

Q

Data management patterns

Answer

A

Database per Service
Shared database
Saganew
API Compositionnew
CQRS
Event sourcing
Application events

Question 2

Q

Data management patterns: context

Answer

A

Let’s imagine you are developing an online store application using the Microservice architecture pattern. Most services need to persist data in some kind of database. For example, the Order Service stores information about orders and the Customer Service stores information about customers.

Question 3

Q

Data management patterns: problem

Answer

A

What’s the database architecture in a microservices application?

Question 4

Q

Data management patterns: forces

Answer

A

Services must be loosely coupled so that they can be developed, deployed and scaled independently

Some business transactions must enforce invariants that span multiple services. For example, the Place Order use case must verify that a new Order will not exceed the customer’s credit limit. Other business transactions, must update data owned by multiple services.

Some business transactions need to query data that is owned by multiple services. For example, the View Available Credit use must query the Customer to find the creditLimit and Orders to calculate the total amount of the open orders.

Some queries must join data that is owned by multiple services. For example, finding customers in a particular region and their recent orders requires a join between customers and orders.

Databases must sometimes be replicated and sharded in order to scale. See the Scale Cube.

Different services have different data storage requirements. For some services, a relational database is the best choice. Other services might need a NoSQL database such as MongoDB, which is good at storing complex, unstructured data, or Neo4J, which is designed to efficiently store and query graph data.

Question 5

Q

Data management patterns: solution

Answer

A

Keep each microservice’s persistent data private to that service and accessible only via its API. A service’s transactions only involve its database.

The service’s database is effectively part of the implementation of that service. It cannot be accessed directly by other services.

There are a few different ways to keep a service’s persistent data private. You do not need to provision a database server for each service. For example, if you are using a relational database then the options are:

Private-tables-per-service – each service owns a set of tables that must only be accessed by that service
Schema-per-service – each service has a database schema that’s private to that service
Database-server-per-service – each service has it’s own database server.

Private-tables-per-service and schema-per-service have the lowest overhead. Using a schema per service is appealing since it makes ownership clearer. Some high throughput services might need their own database server.

It is a good idea to create barriers that enforce this modularity. You could, for example, assign a different database user id to each service and use a database access control mechanism such as grants. Without some kind of barrier to enforce encapsulation, developers will always be tempted to bypass a service’s API and access it’s data directly.

Question 6

Q

Data management patterns: result benefits

Answer

A

Helps ensure that the services are loosely coupled. Changes to one service’s database does not impact any other services.

Each service can use the type of database that is best suited to its needs. For example, a service that does text searches could use ElasticSearch. A service that manipulates a social graph could use Neo4j.

Question 7

Q

Data management patterns: result drawbacks

Answer

A

Implementing business transactions that span multiple services is not straightforward. Distributed transactions are best avoided because of the CAP theorem. Moreover, many modern (NoSQL) databases don’t support them. The best solution is to use the Saga pattern. Services publish events when they update data. Other services subscribe to events and update their data in response.

Implementing queries that join data that is now in multiple databases is challenging.

Complexity of managing multiple SQL and NoSQL databases

Question 8

Q

Data management patterns: issues

Answer

A

API Composition - the application performs the join rather than the database. For example, a service (or the API gateway) could retrieve a customer and their orders by first retrieving the customer from the customer service and then querying the order service to return the customer’s most recent orders.

Command Query Responsibility Segregation (CQRS) - maintain one or more materialized views that contain data from multiple services. The views are kept by services that subscribe to events that each services publishes when it updates its data. For example, the online store could implement a query that finds customers in a particular region and their recent orders by maintaining a view that joins customers and orders. The view is updated by a service that subscribes to customer and order events.

Question 9

Q

Data management patterns: related patterns

Answer

A

Microservice architecture pattern creates the need for this pattern
Saga pattern is a useful way to implement eventually consistent transactions
The API Composition and Command Query Responsibility Segregation (CQRS) pattern are useful ways to implement queries
The Shared Database anti-pattern describes the problems that result from microservices sharing a database

Question 10

Q

Shared database: context

Answer

A

Let’s imagine you are developing an online store application using the Microservice architecture pattern. Most services need to persist data in some kind of database. For example, the Order Service stores information about orders and the Customer Service stores information about customers.

Question 11

Q

Shared database: problem

Answer

A

What’s the database architecture in a microservices application?

Question 12

Q

Shared database: forces

Answer

A

Services must be loosely coupled so that they can be developed, deployed and scaled independently

Some business transactions must enforce invariants that span multiple services. For example, the Place Order use case must verify that a new Order will not exceed the customer’s credit limit. Other business transactions, must update data owned by multiple services.

Some business transactions need to query data that is owned by multiple services. For example, the View Available Credit use must query the Customer to find the creditLimit and Orders to calculate the total amount of the open orders.

Some queries must join data that is owned by multiple services. For example, finding customers in a particular region and their recent orders requires a join between customers and orders.

Databases must sometimes be replicated and sharded in order to scale. See the Scale Cube.

Different services have different data storage requirements. For some services, a relational database is the best choice. Other services might need a NoSQL database such as MongoDB, which is good at storing complex, unstructured data, or Neo4J, which is designed to efficiently store and query graph data.

Question 13

Q

Shared database: solution

Answer

A

Use a (single) database that is shared by multiple services. Each service freely accesses data owned by other services using local ACID transactions.

Question 14

Q

Shared database: result benefits

Answer

A

A developer uses familiar and straightforward ACID transactions to enforce data consistency
A single database is simpler to operate

Question 15

Q

Shared database: result drawbacks

Answer

A

Development time coupling - a developer working on, for example, the OrderService will need to coordinate schema changes with the developers of other services that access the same tables. This coupling and additional coordination will slow down development.

Runtime coupling - because all services access the same database they can potentially interfere with one another. For example, if long running CustomerService transaction holds a lock on the ORDER table then the OrderService will be blocked.

Single database might not satisfy the data storage and access requirements of all services.

Question 16

Q

Shared database: related

Answer

A

Database per Service is an alternative approach

Question 17

Q

Saga: context

Answer

A

You have applied the Database per Service pattern. Each service has its own database. Some business transactions, however, span multiple service so you need a mechanism to ensure data consistency across services. For example, lets imagine that you are building an e-commerce store where customers have a credit limit. The application must ensure that a new order will not exceed the customer’s credit limit. Since Orders and Customers are in different databases the application cannot simply use a local ACID transaction.

Question 18

Q

Saga: problem

Answer

A

How to maintain data consistency across services?

Question 19

Q

Saga: forces

Answer

A

2PC is not an option

Question 20

Q

Saga: solution

Answer

A

Implement each business transaction that spans multiple services as a saga. A saga is a sequence of local transactions. Each local transaction updates the database and publishes a message or event to trigger the next local transaction in the saga. If a local transaction fails because it violates a business rule then the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.

There are two ways of coordination sagas:

Choreography - each local transaction publishes domain events that trigger local transactions in other services
Orchestration - an orchestrator (object) tells the participants what local transactions to execute

Question 21

Q

Saga: result benefits

Answer

A

It enables an application to maintain data consistency across multiple services without using distributed transactions

Question 22

Q

Saga: result drawbacks

Answer

A

The programming model is more complex. For example, a developer must design compensating transactions that explicitly undo changes made earlier in a saga.

Question 23

Q

Saga: issues

Answer

A

In order to be reliable, a service must atomically update its database and publish a message/event. It cannot use the traditional mechanism of a distributed transaction that spans the database and the message broker. Instead, it must use one of the patterns listed below.

Question 24

Q

Saga: related

Answer

A

The Database per Service pattern creates the need for this pattern
The following patterns are ways to atomically update state and publish messages/events:
- Event sourcing
- Application events
A choreography-based saga can publish events using Aggregates and Domain Events

Question 25

Q

API composition: context

Answer

A

You have applied the Microservices architecture pattern and the Database per service pattern. As a result, it is no longer straightforward to implement queries that join data from multiple services.

Question 26

Q

API composition: problem

Answer

A

How to implement queries in a microservice architecture?

Question 27

Q

API composition: solution

Answer

A

Implement a query by defining an API Composer, which invoking the services that own the data and performs an in-memory join of the results.

Question 28

Q

API composition: example

Answer

A

An API Gateway often does API composition.

Question 29

Q

API composition: result benefits

Answer

A

It a simple way to query data in a microservice architecture

Question 30

Q

API composition: result drawbacks

Answer

A

Some queries would result in inefficient, in-memory joins of large datasets.

Question 31

Q

API composition: related

Answer

A

The Database per Service pattern creates the need for this pattern
The CQRS pattern is an alternative solution

Question 32

Q

CQRS: context

Answer

A

You have applied the Microservices architecture pattern and the Database per service pattern. As a result, it is no longer straightforward to implement queries that join data from multiple services. Also, if you have applied the Event sourcing pattern then the data is no longer easily queried.

Question 33

Q

CQRS: problem

Answer

A

How to implement a query that retrieves data from multiple services in a microservice architecture?

Question 34

Q

CQRS: solution

Answer

A

Define a view database, which is a read-only replica that is designed to support that query. The application keeps the replica up to data by subscribing to Domain events published by the service that own the data.

Question 35

Q

CQRS: result benefits

Answer

A

Supports multiple denormalized views that are scalable and performant
Improved separation of concerns = simpler command and query models
Necessary in an event sourced architecture

Question 36

Q

CQRS: result drawbacks

Answer

A

Increased complexity
Potential code duplication
Replication lag/eventually consistent views

Question 37

Q

CQRS: related

Answer

A

The Database per Service pattern creates the need for this pattern
The API Composition pattern is an alternative solution
The Domain event pattern generates the events
CQRS is often used with Event sourcing

Question 38

Q

Event sourcing: context

Answer

A

A service typically need to atomically update the database and publish messages/events. For example, perhaps it uses the Saga pattern. In order to be reliable, each step of a saga must atomically update the database and publish messages/events. Alternatively, it might use the Domain event pattern, perhaps to implement CQRS. In either case, it is not viable to use a distributed transaction that spans the database and the message broker to atomically update the database and publish messages/events.

Question 39

Q

Event sourcing: problem

Answer

A

How to reliably/atomically update the database and publish messages/events.

Question 40

Q

Event sourcing: forces

Answer

A

2PC is not an option

Question 41

Q

Event sourcing: solution

Answer

A

A good solution to this problem is to use event sourcing. Event sourcing persists the state of a business entity such an Order or a Customer as a sequence of state-changing events. Whenever the state of a business entity changes, a new event is appended to the list of events. Since saving an event is a single operation, it is inherently atomic. The application reconstructs an entity’s current state by replaying the events.

Applications persist events in an event store, which is a database of events. The store has an API for adding and retrieving an entity’s events. The event store also behaves like a message broker. It provides an API that enables services to subscribe to events. When a service saves an event in the event store, it is delivered to all interested subscribers.

Some entities, such as a Customer, can have a large number of events. In order to optimize loading, an application can periodically save a snapshot of an entity’s current state. To reconstruct the current state, the application finds the most recent snapshot and the events that have occurred since that snapshot. As a result, there are fewer events to replay.

Question 42

Q

Event sourcing: result benefits

Answer

A

It solves one of the key problems in implementing an event-driven architecture and makes it possible to reliably publish events whenever state changes.

Because it persists events rather than domain objects, it mostly avoids the object‑relational impedance mismatch problem.

It provides a 100% reliable audit log of the changes made to a business entity

It makes it possible to implement temporal queries that determine the state of an entity at any point in time.

Event sourcing-based business logic consists of loosely coupled business entities that exchange events. This makes it a lot easier to migrate from a monolithic application to a microservice architecture.

Question 43

Q

Event sourcing: result drawbacks

Answer

A

It is a different and unfamiliar style of programming and so there is a learning curve.

The event store is difficult to query since it requires typical queries to reconstruct the state of the business entities. That is likely to be complex and inefficient. As a result, the application must use Command Query Responsibility Segregation (CQRS) to implement queries. This in turn means that applications must handle eventually consistent data.

Question 44

Q

Event sourcing: related

Answer

A

The Saga and Domain event patterns create the need for this pattern.
The CQRS must often be used with event sourcing.
Event sourcing implements the Audit logging pattern.