INTRO Flashcards

1
Q

Why are caches necessary in data intensive applications

A

They are usually used to speed up Reads or remember the result of an expensive operation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is stream processing

A

Stream processing involves sending a message to another process for it to be handled asynchronously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does RELIABILITY in the context of building data intensive systems mean?

A

In the context of DIS, reliability simply means that even in the face of human, hardware or software errors, a system should still be able to function “correctly” at a desired level of performance.

Simply put: the system should continue to work correctly even when things go wrong.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does SCALABILITY in the context of building data intensive systems mean?

A

In the context of DIS, scalability simply means that as the software system grows (in data, traffic volume etc) , the system should be resilient enough to accommodate that growth or there should be reasonable ways to deal with such growth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does MAINTAINABILITY in the context of building data intensive systems mean?

A

In the context of DIS, maintainability simply means that overtime as more people work on a system (improving it’s existing functionalities or implementing new ones) they should be able to work on it productively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a FAULT?

A

We say a fault occurs in a system when one component of the system deviates from its requirements specification (stops working)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a FAILURE?

A

A failure occurs in a system when the entire system stops working and hence doesn’t provide the required service to the user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to mitigate faults

A

Trigger faults deliberatly (e.g shutting down a server out of the blue). Doing this exposes cases where there’s poor error handling.
In general we want to tolerate faults (most of the time) rather than prevent faults (cos some faults are not preventable except well, security faults)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What three errors could occur in a system

A

Hardware errors: They have weak correlations, it is unlikely that one hard disk crashing will affect another hard disk

Software errors: They have strong correlations and can pull down an entire system (cause failure)

Human errors: Well humans design these systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to reduce the occurrence of human errors

A
  1. Design systems in a manner that reduces the likelihood of making an error.
    For example, well-designed abstractions, APIs, and admin interfaces make it easy to do “the
    right thing” and discourage “the wrong thing.” However, if the interfaces are too
    restrictive people will work around them, negating their benefit, so this is a tricky
    balance to get right.
  2. Create a separate environment where people can make mistakes from the environment where these mistakes can cause actual failures. A good example of this is a sandbox
  3. Carry out tests: unit, integration, end-to-end
  4. Detailed and clear monitoring and logging should be set up
  5. Make recovery from human errors easy and quick so as to reduce the impact in the event of a failure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is one common cause of degradation of systems?

A

Increased load. For instance a system that was handling 10,000 requests per second could find itself handling 100,000. The question now becomes how to handle this increase in load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some scalability questions that can be asked?

A

If my system has grown in X kind of way, how do i handle such growth or how can i cope with such growth?

What computing resources can I add to cope with the additional load?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is response time?

A

How long does it take to get a response for the request sent (usually by a user, client).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is latency?

A

Latency is the amount of time that a request is waiting to be handled during which it is latent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How should response time be thought of?

A

Not as a single value, but as a distribution of values that can be measured. Why?
Because In practice, in a system handling a variety of
requests, the response time can vary a lot. Hence if you send the same request over and over again, one would notice that the response time differs even if it is the same request

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a good measure for response time of requests?

A

Percentiles and median (50th percentile).

Sorting response times from fastest to slowest and taking the median (the middle value) will give the information that half of the requests are served in that “middle value” amount of time.

This is very good because it tells you approximately how long your users have to wait to get back responses

17
Q

How fast is a single client side request that requires multiple backend calls in the backend service

A

It is as fast as the slowest parallel operation

18
Q

Three design principles for software systems

A

Operability: Systems should be designed in a manner that helps the operations team keep the system running

Simplicity: Write code for your fellow humans

Evolvability: Systems should be designed in a manner that they can easily be updated, fixed.