Data Management Patterns Flashcards

1
Q

What is the cache aside pattern

A

Applications implement a cache to improve repeated access to information held in a data store. If the data isn’t in the cache then the application fetches the data and places it in the cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is read through caching?

A

Data is read from a cache by a client. If the data isn’t present in the cache it’s first read by a loader that knows how to access the data of record.

link to source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is write through caching

A

The cache is configured with a writer component that knows how to write to the system of record. When a value is written to the cache it’s written to the system of record by the writer and the cache

link to source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are considerations when implementing the cache aside pattern?

A
  • Lifetime of cached data - how long to keep data in the cache if it’s not used
  • Evicting data - decide what data to remove from fixed size cache
  • Priming the cache - prepopulate the cache with data.
  • Consistency - not guaranteed between the cache and the data store
  • Local(in memory) cache - multiple copies of the data in different applications leads to inconsistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the CQRS pattern

A

It’s the command and query responsibility segregation pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the characteristics of the CQRS pattern

A

Reads and writes for a data store are separated. Reads and write workloads are asymmetric and typically have different PSR requirements.

Goal is to have separate models where writes would have validation logic and read operations may involve many different queries with different data transfer objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the common implementation characteristics of the CQRS pattern?

A

Writes are treated as commands that are tasked based(e.g., book hotel room vs set reservation status to reserved)

Queries never modify the data store. No domain knowledge is encapsulated in data transfer objects returned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 5 benefits of the CQRS pattern?

A
  • Independent scaling - read/write workloads can scale independently
  • Optimized data schemas - reads use schemas optimized for reading. Writes use schema optimized for writing
  • Security - easier to ensure right domain entities are making changes
  • Separation of concerns - separating reads/writes can result in simpler models which are easier to maintain. Read model usually simple while write model contains the more complex validation logic.
  • Simpler queries - a materialized view in the data base reduces the needs for complex joins
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are 3 implementation considerations to be aware of for CQRS?

A

Complexity, Messaging, Eventual consistency

  • Complexity - more complex application design
  • Messaging - common to use messaging stack. Need to deal with failures and duplicate messages
  • Eventual Consistency - If db’s are separate for read/write then read db needs to be sync’d when writes occur. Difficult to detect stale data on reads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is using the CQRS pattern appropriate?

A
  • Collaborative domains where many users access same data in parallel. Define granular commands to minimize merge conflicts at the domain level
  • Domain models have complex steps and/or domain models. Writes are complex in that they maintain consistency and validate user input. The Read model has no business logic and returns highly de-normalized views
  • Independently tune Read/write performance
  • Separate teams work on read and write models
  • System evolves over time. Multiple versions of the model exist. Business rules change frequently.
  • Integration with systems where the temporary failure of one subsystem shouldn’t effect the availability of others
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Event Sourcing pattern?

A

Handles operations on data that’s driven by a sequence of events each of which is recorded in an append only store. Each event represents a set of changes to the data.

link to source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the advantages of the Event Sourcing pattern?

A
  • UI/workflow isn’t blocked. It posts and event and continues.
  • Less likely to encounter contention during processing of transactions
  • Promotes extensibility and flexibility through decoupling events from tasks
  • Helps prevent conflicts from concurrent updates
  • Lends itself to acting as an audit trail for actions(updates, compensating events) taken with an external data store

link to source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the characteristics of Events in the Event Sourcing Pattern?

A
  • Events are immutable.
  • Events are simple and don’t directly update the data store
  • Events are recorded for processing at a later time
  • Events have meaning for a domain expert
  • Events processing is decoupled from event source.
  • Consistency of events in the event store is vital, as is the order of events that affect a specific entity

link to source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the consumer of events in the Event Sourcing pattern?

A

Tasks consume events.

link to source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the characteristics of event consumers in the Event Sourcing pattern?

A
  • They know about the event and event data.
  • They don’t know about the operation that raised the event.
  • Event publication might be at least** **once, and so consumers of the events must ensure idempotent behavior.

link to source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the considerations when using the Event Sourcing pattern?

A
  • The system will only be eventually consistent when creating materialized views or generating projections of data by replaying events.
  • There’s some delay between an application adding events to the event store
  • The application must still be able to deal with inconsistencies that result from eventual consistency and the lack of transactions

link to source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When using the Event Sourcing pattern how do you deal with large streams of events?

A

If the streams are large, consider creating snapshots at specific intervals such as a specified number of events

link to source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When using the Event Sourcing pattern how do you get the “current” state for an entity?

A

Need to replay all of the events that relate to it against the original state of that entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When using the Event Sourcing pattern for updating an entity how is a change to that entity made?

A

The only way to update an entity is to undo an existing change by adding a compensating event to the event store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

From a migration perspective what is important to consider when using the Event Sourcing Pattern

A

Use versioning on messages to old messages can be properly identified and converted to new format messages when they’re played back as part of recreating the event history

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the Index Table Pattern

A

Create Indices over fields in data stores that are frequently referenced by queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What problem is the Index Table Pattern solving?

A

Secondary indices are common on relation db’s but a lot of nosql data bases don’t provide this capability. This pattern suggests ways to implement your own secondary indices on top of such a datastore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When should the Index Table pattern not be used

A
  • Data is volatile causing maintenance of index table to be high
  • A field selected as secondary key is non-discriminating and can only have a small set of values
  • Data values for a secondary index are highly skewed. e.g., 90% of values are the same and are frequently accessed
24
Q

What are the primary considerations when using the Index Table pattern?

A
  • Understand queries used to access data before designing secondary indices.
  • Overhead of maintaining secondary indices can be significant
  • Duplicating data in the index table can add significant storage overhead costs
  • Implementing a normalized index table that references the primary table requires two lookups to find the data
  • Maintaining consistency of index tables which reference very large data sets can be difficult
  • Index tables might be partitioned or sharded
25
Q

What are the strategies for implementing the Index Table pattern?

A
  • Complete denormalization - Duplicate keys in the index table and organize it by different keys
  • Create normalized index tables organized by different keys derived from the frequently accessed fields and reference original data by primary key rather then duplicating it
  • Create partially normalized index tables that are organized by keys that duplicate frequently access fields
26
Q

Materialized View Pattern

What is the Materialized View pattern?

A
  • Populate views over data in one or more data stores when the data isn’t ideally formatted for required query operations
27
Q

Materialized View Pattern

What problem does the Materialized View pattern solve?

A

When storing data primary focus is on the format of data, how to store data, and integrity of data. Priority consideration isn’t given to how the data will be read and this negatively effects query performance

28
Q

Materialized View Pattern

What are the key characteristics of a Materialized View

A
  • Data formatted to suite the results set of query
  • Contain data only needed for the query
  • Include current values of data or values of calculated columns
  • Is completely disposable and can be rebuild from the data store
  • It’s a specialized cache so it can never be updated by an application
  • When data changes the view must be updated
29
Q

Materialized View Pattern

What are key considerations when using the Materialized View?

A
  • How and when to update the data
  • A primary use case is with event source pattern where view can only be generating replayed events
  • Tend to be used where a few queries exist
  • With large numbers of queries storage costs will increase
  • Lack of data consistency between the view and the data in the data store
  • Data is transient and only used to improve query performance or scalability by reflecting current state of the data
  • View can be rebuilt so can be stored in a less reliable location
30
Q

Materialized View Pattern

When is the Materialized View Useful?

A
  • Provides view for data that is difficult to query directly
  • Where improved query performance is needed
  • Connection to data store isn’t always available
  • Abstract away how data is stored where there are different sources of data needed to be combined to retrieve relevant info
  • Provide access to subsets of data that shouldn’t be readily accessible due to privacy or security concerns
  • Bridge data stores to take advantage of their unique capabilities
  • Provide consolidated views from data retrieved from different microservices
31
Q

Materialized View Pattern

When should the Materialized View Pattern not be used?

A
  • Data source is simple and easy to query
  • Data changes very quickly
  • Data consistency between source and view is high priority
32
Q

Sharding Pattern

What is the Sharding Pattern?

A

Dividing a data store into a set of horizontal partitions

linkto source

33
Q

Sharding Pattern

What problems was the Sharding pattern used to solve

A
  • When there’s a need to avoid the vertical upper bound limits of a single storage node by scaling out Horizontally
  • When there’s a need for splitting data across multiple Data stores to avoid hitting upper bound on how much data can be stored on a single data store
  • When there’s a need for large numbers of concurrent user access Network & Computing resources for a single data store are limited
  • When there are data compliance regulations that may require storing info in region where data entered
  • When there’s a need to reduce latency to data that’s accessed from different regions

linkto source

34
Q

Sharding Pattern

When should the Sharding pattern be used?

A
  • When a data store will likely need to scale beyond the resources available to a single storage node
  • When you need to improve performance by reducing contention in a data store
  • A byproduct is higher availability due to data being separated into separate partitions

linkto source

35
Q

Sharding Pattern

How is the Sharding Pattern implemented?

A
  • Subsets of data are partitioned across multiple shards.
  • Each Shard represents a data store.
  • All Shard data stores have the same schema

linkto source

36
Q

Sharding Pattern

What are characteristics of data that exist in a shard

A
  • The data contains items that fall within a specified range determined by attributes of the data

linkto source

37
Q

Sharding Pattern

What is a Shard Key?

A
  • Its a static value that represents attributes which represent a range of the data stored in a particular data store associated with the Shard.

linkto source

38
Q

Sharding Pattern

What are the 3 Sharding Strategies

A
  • Range
  • Lookup
  • Hash - logic implements a hash based on attributes of the data store to route data requests to the appropriate Shard

linkto source

39
Q

Sharding Pattern

What are the characteristics of the Hash Sharding Strategy

A
  • Logic implements a hash based on attributes of the data store to route data requests to the appropriate Shard

linkto source

40
Q

Sharding Pattern

What are the characteristics of the Lookup Sharding Strategy

A
  • Logic implements a map that routes a request for data to the Shard that contains the data

linkto source

41
Q

Sharding Pattern

What are the advantages of the Lookup Sharding Strategy

A
  • More control over the way Shards are configured and used
  • Impact is reduced when rebalancing data. Can add physical partitions to even out the workload

linkto source

42
Q

Sharding Pattern

What are the disadvantages of the Lookup Sharding Strategy?

A
  • Looking up Shard locations can impose an additional overhead
  • Requires state to be highly cacheable and replica friendly

linkto source

43
Q

Sharding Pattern

What are the characteristics of the Range Sharding Strategy?

A
  • Related items are grouped together and ordered by Shard key

linkto source

44
Q

Sharding Pattern

What are the advantages of using Range Sharding strategy?

A
  • Easy to implement
  • Works well with range queries

linkto source

45
Q

Sharding Pattern

What are the disadvantages of using Range Sharding strategy?

A
  • Optimal balancing across Shards is hard to achieve
  • Hard to rebalance Shards and this may not solve performance problems
  • Data movement or scaling must be done when data is all or partly offline
  • State may have to be maintained that maps ranges to partitions

linkto source

46
Q

Sharding Pattern

What are the advantages of using Hash Sharding strategy?

A
  • Offers a better chance of distribution load and data evenly
  • Routing can be achieved directly using the hash. Don’t need a map

linkto source

47
Q

Sharding Pattern

What are the disadvantages of using the Hash Sharding strategy?

A
  • Difficult to rebalance Shards
  • Computing hash may impose additional overhead

linkto source

48
Q

Sharding Pattern

Why is it best to minimize operations that affect data in multiple Shards?

A
  • It’s difficult to maintain referential integrality and consistency across Shards

linkto source

49
Q

Static Content Hosting Pattern

What is the Static Content Hosting Pattern?

A
  • Deploy static content to a 3rd party cloud-based storage service

link to source

50
Q

Static Content Hosting Pattern

Why use the Static Content Hosting Pattern?

A
  • Allows you to offload the server from having to render static content.
  • Cloud storage is much cheaper then compute resources

link to source

51
Q

Static Content Hosting Pattern

What are the main considerations when using the Static Content Hosting Pattern?

A
  • Deployment of the application
  • Securing access to data that isn’t meant to be available for anonymous user usage

link to source

52
Q

Static Content Hosting Pattern

How to achieve the best performance and availability using the Static Content Hosting Pattern?

A
  • Use a CDN(content delivery network) to cache the storage contents in data centers around the world
  • Securing access to data that isn’t meant to be available for anonymous user usage

link to source

53
Q

Static Content Hosting Pattern

How should a Storage Accounts be referenced when using the Static Content Hosting Pattern?

A
  • Use a URL as apposed to an IP address. The IP address may change due to availability of Storage

link to source

54
Q

Static Content Hosting Pattern

Why is it more challenging to update an application when using the Static Content Hosting Pattern?

A
  • You might have to perform separate deployments
  • You might have to version and content to manage it more easily. When scripts are involved this is important

link to source

55
Q

Static Content Hosting Pattern

What are some key considerations regarding access to Storage when using the Static Content Hosting Pattern?

A
  • Storage may not support custom domain names
  • Application maintenance is more difficult if custom domain names aren’t supported then you’ll need to provide the full URL(which will be in a different domain) to the resource.
  • Storage must be configured for public Read access and it’s vital to ensure that write access is disabled to prevent unauthorized uploads.

link to source