Reliable, Scalable Maintainable Systems Flashcards by David Stojanovski

What are five common data processing options for data-intensive applications, and what, broadly, are they used for?

Databases: Persistent data storage
Caches: Speed up reads to frequently-read data
Search Indexes: Allow users to efficiently filter and search data
Stream Processing: Asynchronous messaging between processes
Batch Processing: Periodically processing large volumes of accumulated data

How well did you know this?

Not at all

Perfectly

What two factors changed over the past decade in terms of how data processing systems are categorised?

The distinction between various types of data processing options has become blurred, and modern solutions tend to fit in multiple categories, e.g. Redis can function both as a data store and as a message queue
Rather than having a single tool that is used for all purposes, systems tend to be composed of multiple disparate systems that each meet a specific need, and these are orchestrated by application code

How well did you know this?

Not at all

Perfectly

What is meant by the reliability of a system?

A reliable system continues to work correctly (that is, returns accurate data within a certain performance constraint) in the event of hardware faults, software faults and human error.

How well did you know this?

Not at all

Perfectly

What is meant by the scalability of a system?

As the system or its data requirements grow in inherent complexity, how well the system grows to meet those requirements without growing in incidental complexity through some measurement of its performance.

How well did you know this?

Not at all

Perfectly

What is meant by the maintainability of a system?

As the number of people working on a system grows or changes over time, the maintainability of a system refers to how well the system continues to perform its existing functions over time, and how well it can adapt to new use cases that may arise.

How well did you know this?

Not at all

Perfectly

What are some things that can broadly be expected of a reliable system?

The system performs its expected function
It can tolerate user error
It is sufficiently performant for its use case and continues to be as volume grows
It prevents any unauthorised access from malicious users

How well did you know this?

Not at all

Perfectly

What is the distinction between a fault and a failure?

A fault is some unexpected, adverse even that can affect a system, e.g. software or hardware failure within a system. A failure, on the other hand, is the where a system (through a fault) no longer serves its intended function.

How well did you know this?

Not at all

Perfectly

How do you ensure a system is fault tolerant?

By inducing faults in the system (e.g. simulating hardware failures, attacks, service outages) and monitoring system behaviour, you can develop a profile of what types of faults a system can tolerate. No system is impervious to all types of faults, so it is critical that you identify how a system is fault tolerant and the constraints of the faults that can occur.

How well did you know this?

Not at all

Perfectly

What are two reasons that the trend moved away from ensuring individual hardware components are resilient and toward systems becoming more resilient to individual hardware failures?

As systems grow in data processing requirement complexity, the amount of hardware these systems need to operate in a robust and performant manner grows, meaning the likelihood of an individual failure increases, so a hardware failure should not cause a system failure
On shared infrastructure like AWS, virtualised infrastructure can become unavailable as services respond to use cases that require elasticity, rather than ensuring individual machines are preserved

How well did you know this?

Not at all

Perfectly

Why are software more faults more likely to cause system failures than hardware faults?

Hardware faults tend not to be correlated unless under exigent circumstances, while software faults tend to be correlated over redundant replicates of a service in a system, meaning it’s more likely that a failure that impacts a node serving a specific function in a system will impact all of its replicas, causing an outage in a critical part of a system and hence a failure for the whole system
Software faults tend to be less predictable, as some are inherent to the design

How well did you know this?

Not at all

Perfectly

What are some ways that a software fault can cause an outage in a distributed, redundant system?

Bad inputs result in an unhandled exception on all replicas of a back-end service
A memory leak/bug causes on service to consume all of a shared hardware resource, causing other services sharing the same hardware to fail
An operating system service fails, and services dependent on that system service can fail or hang as a result
Cascading failures, i.e. one fault causing another fault and so on until the system fails overall

How well did you know this?

Not at all

Perfectly

What is a load parameter in the context of measuring the data load of a system?

Any measurable, independent variable quantity that describes operations that occur within a system. Some examples include:

Requests per second to page
Read/write ratio for a database
Cache hit rate

How well did you know this?

Not at all

Perfectly

How do you describe performance in relation to the load parameters of a system?

When the load parameter increases and the CPU/memory resources are kept the same, how does the performance of the system get affected?
When you increase a load parameter, how much do you need to increase the resources available to the system in order for it to have the same performance?

How well did you know this?

Not at all

Perfectly

What is one major difference between how performance is measured for batch processing system when compared with real-time processing systems?

In batch processing systems, the number of records that can be processed within a given timeframe (the throughput) is a measure of its performance
In real-time systems, the more important metric is the response time, i.e. the time it takes for a user interaction to have a corresponding response

How well did you know this?

Not at all

Perfectly

What is the difference between latency and response time?

The response time is the time the response takes after the request, while the latency is the time between when the request was initiated and when the request was processed by the back-end system.

How well did you know this?

Not at all

Perfectly

What is a common method of measuring the performance of a real time system mathematically?

Study These Flashcards

By measuring response times over time and collecting samples, statistical information can be determined from these samples, e.g. ordering them from fastest to slowest response time to determine the median, 95th percentile, 99th percentile response times, etc.

What are some examples of things that can cause random additional latency in back-end systems?

Study These Flashcards

Packet loss
Garbage collection delays
Page faults
Physical effects on hardware (vibration, heat)

What are tail latencies?

Study These Flashcards

Tail latencies are the response times that sit in the very high percentile end of the response time distribution, e.g. 99.9th percentile response times.

What is one major cause of long response times?

Study These Flashcards

Queuing delays often cause the largest delays, due to limitations in the number of records that can be processed in parallel.

What is head-of-line blocking?

Study These Flashcards

Head of line blocking is the phenomenon where a small number of slow requests end up causing queuing delays that cause an increase in the response times for requests that otherwise would have executed quickly.

If a given operation in a distributed system involves calls to multiple back-end components in a system, what does this mean for the overall response time of the original operation?

Study These Flashcards

Assuming the intermediate response times for the calls to each backend service all occur in parallel, the longest response time of those backend calls determines the overall response time for the operation.

In terms of vertical and horizontal scaling, what is the most practical approach when dealing with the majority of systems?

Study These Flashcards

A combination of both vertical and horizontal scaling is typically most practical.

What is one benefit and one drawback of an elastic system?

Study These Flashcards

Elastic systems cope well when given unpredictable loads by scaling more dynamically than manually skilled systems, but manually-scaled systems are more predictable from an operations standpoint.

What is typical of a maintainable system?

Study These Flashcards

It is simple for an operations team to run
The code has low incidental complexity relative to inherent complexity
The system can evolve to meet new use cases and changing constraints

What are main driving forces behind using a NoSQL database?

1. Query operations that relational models don't do well | 2. Less restrictive than schemas

What is a key problem with the relational data model for applications that are commonly written today, and what is a way to mitigate this problem?

Most business applications are written in object-oriented languages, which require a translation layer between the inherit representation of an object and how it is represented in tables, rows and columns. ORM frameworks partially abstract away this mismatch between models.

What are some of the inherent advantages to a JSON data model and when are these advantages realised?

1. Documents better match objects in structure 2. Better data locality, so fewer joins 3. Schema-on-read, so evolvable

What is the primary idea behind normalisation in databases?

Removing duplication of meaningful data in databases that will be shared across multiple records in the database by instead using an ID foreign key mapping to a table with standardised values.

What are the main benefits of normalising a database?

1. Single point of update 2. Better data consistency 3. Better semantics than arbitrary text 4. Easier localisation support

What are the major drawback of a document-based model?

1. If there is a many-to-one where many records reference one common record, meaning it would need to be stored as an ID in the document to avoid duplication and multiple queries may be required to retrieve the related data 2. Even though a model may not have originally required many-to-one or many-to-many relationships or joins, it may evolve over time into a more interconnected structure that does and document databases have limited support for joins 3. Denormalising data or replicating joins in code can lead to worse maintainability, reliability and performance for the data model

Reliable, Scalable Maintainable Systems Flashcards

(30 cards)