INTRO Flashcards

Question 1

Q

Why are caches necessary in data intensive applications

Answer

A

They are usually used to speed up Reads or remember the result of an expensive operation

Question 2

Q

What is stream processing

Answer

A

Stream processing involves sending a message to another process for it to be handled asynchronously

Question 3

Q

What does RELIABILITY in the context of building data intensive systems mean?

Answer

A

In the context of DIS, reliability simply means that even in the face of human, hardware or software errors, a system should still be able to function “correctly” at a desired level of performance.

Simply put: the system should continue to work correctly even when things go wrong.

Question 4

Q

What does SCALABILITY in the context of building data intensive systems mean?

Answer

A

In the context of DIS, scalability simply means that as the software system grows (in data, traffic volume etc) , the system should be resilient enough to accommodate that growth or there should be reasonable ways to deal with such growth

Question 5

Q

What does MAINTAINABILITY in the context of building data intensive systems mean?

Answer

A

In the context of DIS, maintainability simply means that overtime as more people work on a system (improving it’s existing functionalities or implementing new ones) they should be able to work on it productively

Question 6

Q

What is a FAULT?

Answer

A

We say a fault occurs in a system when one component of the system deviates from its requirements specification (stops working)

Question 7

Q

What is a FAILURE?

Answer

A

A failure occurs in a system when the entire system stops working and hence doesn’t provide the required service to the user

Question 8

Q

How to mitigate faults

Answer

A

Trigger faults deliberatly (e.g shutting down a server out of the blue). Doing this exposes cases where there’s poor error handling.
In general we want to tolerate faults (most of the time) rather than prevent faults (cos some faults are not preventable except well, security faults)

Question 9

Q

What three errors could occur in a system

Answer

A

Hardware errors: They have weak correlations, it is unlikely that one hard disk crashing will affect another hard disk

Software errors: They have strong correlations and can pull down an entire system (cause failure)

Human errors: Well humans design these systems

Question 10

Q

How to reduce the occurrence of human errors

Answer

A

Design systems in a manner that reduces the likelihood of making an error.
For example, well-designed abstractions, APIs, and admin interfaces make it easy to do “the
right thing” and discourage “the wrong thing.” However, if the interfaces are too
restrictive people will work around them, negating their benefit, so this is a tricky
balance to get right.
Create a separate environment where people can make mistakes from the environment where these mistakes can cause actual failures. A good example of this is a sandbox
Carry out tests: unit, integration, end-to-end
Detailed and clear monitoring and logging should be set up
Make recovery from human errors easy and quick so as to reduce the impact in the event of a failure

Question 11

Q

What is one common cause of degradation of systems?

Answer

A

Increased load. For instance a system that was handling 10,000 requests per second could find itself handling 100,000. The question now becomes how to handle this increase in load.

Question 12

Q

What are some scalability questions that can be asked?

Answer

A

If my system has grown in X kind of way, how do i handle such growth or how can i cope with such growth?

What computing resources can I add to cope with the additional load?

Question 13

Q

What is response time?

Answer

A

How long does it take to get a response for the request sent (usually by a user, client).

Question 14

Q

What is latency?

Answer

A

Latency is the amount of time that a request is waiting to be handled during which it is latent.

Question 15

Q

How should response time be thought of?

Answer

A

Not as a single value, but as a distribution of values that can be measured. Why?
Because In practice, in a system handling a variety of
requests, the response time can vary a lot. Hence if you send the same request over and over again, one would notice that the response time differs even if it is the same request

Question 16

Q

What is a good measure for response time of requests?

Answer

Study These Flashcards

A

Percentiles and median (50th percentile).

Sorting response times from fastest to slowest and taking the median (the middle value) will give the information that half of the requests are served in that “middle value” amount of time.

This is very good because it tells you approximately how long your users have to wait to get back responses

Question 17

Q

How fast is a single client side request that requires multiple backend calls in the backend service

Answer

Study These Flashcards

A

It is as fast as the slowest parallel operation

Question 18

Q

Three design principles for software systems

Answer

Study These Flashcards

A

Operability: Systems should be designed in a manner that helps the operations team keep the system running

Simplicity: Write code for your fellow humans

Evolvability: Systems should be designed in a manner that they can easily be updated, fixed.

INTRO Flashcards

(18 cards)