Data Intensive Ch1 - Reliable, Scalable and Maintainable Applications Flashcards

Question 1

Q

Pillars of reliability

Answer

A

System should continue to work CORRECTLY (correct function at desired performance) even in the face of ADVERSITY

Tolerating:
Hardware faults
Software faults
Human error

Question 2

Q

Pillars of scalability

Answer

A

As system GROWS in data volume, traffic volume, complexity there should be reasonable ways of dealing with that growth

Measuring load
Measuring performance
Latency percentiles
Throughput

Question 3

Q

Maintainability

Answer

A

Over time many folks will work with the system (engineering and operations) and they should be able to do so PRODUCTIVELY

Operability
Simplicity
Evolvability

Question 4

Q

What is the different between data and compute intensive apps?

Answer

A

Data intensive are rarely limited by CPU power
The challenges are:
Amount of data
Complexity of data
Speed oat which data is changing

Question 5

Q

How is data intensive apps is typically built?

Answer

A

From standard building blocks providing commonly needed functionality like:
Database for storing data
Caches for storing result of expensive operations or to speed up reads
Search indexes to allow look up data in various ways
Stream procesing to send an asyc message to another process
Batch processing to periodically crunch large data

Elements are so obvious nobody ever thinks about writing them from the scratch
BUT each of the blocks is provided in different variants of different characteristics and different apps have different requirements..
Combining tools can be difficult if requirement is to do something single tool cannot do alone.

Question 6

Q

Database and message queue have some superficial similarity - both store data for some time. So what is different?

Answer

A

Access patterns to data -> different performance characteristics -> different underlying implementation.

Question 7

Q

Context map

Answer

A

Fig 1-1 p5

Question 8

Q

Factors the incluence design of data systems

Answer

A

Skill &amp; Exp of people involved
Legacy system dependencies
Time scale for delivery
Org's tolerance of diff kinds of risk
Regulatory constraints

Question 9

Q

What does working correctly mean in context of reliability?

Answer

A

App performs the function user expects
It tolerates user making mistakes or using software in unexpected ways
Performance is good enough for the use case, under expected load
System prevents any unauthorized access and abuse

Question 10

Q

Things that can go wrong are called…

System that anticipate them and copes with them is called

Answer

A

Faults
fault-tolerant or resilient

Fault-tolerant does not mean it can tolerate ANY fault

Question 11

Q

Fault vs failure

Answer

A

Fault - one component of system deviates from its spec

Failure - system as whole stops providing the required service to user

Question 12

Q

Hardware errors

Answer

A

Usually thought of as random and independent from each other
Failure of disk on one machine usually does not imply failure on another machine (could be the case if server racks temp goes up)

Redundancy of disks (hardware components) was enough until recently
Single machine failure was rare so multi-machine redundancy was not needed

As data volume grows, apps began using more machines which increases probability of hardware faults
Cloud platforms commonly do not guarantee single-machine reliability
Hence movement towards sysytems tolerating loss of entire machines in addition to hardware redundancy

Examples:
hard disks cras
faulty RAM
power grid blackout
Cable is unplugged

Question 13

Q

Software errors

Answer

A

Bug which causes app instance to crash on given input
Runaway process that eats up shared resource like RAN or network bandwith
External service dependency slows down or crashes or returns corrupted responses (SAM JWKS hello hello!)
Cascading failures - small fault in one component propagates faults in another and another etc

Usually lie dormant until triggered by unusual circumstances
Usually reveal some assumption about apps environment which USUALLY is true

Remedies: analysis, testing, process isolation, monitoring and alerts

Data Intensive Ch1 - Reliable, Scalable and Maintainable Applications Flashcards

(13 cards)