Big Data Lecture 10 Performance at Large Scales Flashcards

1
Q

With 1000s of nodes in a system, what will be the performance distribution over them?

A

Most nodes will take around the mean time, but some nodes will take extremely long: phenomenon of <b>tail latency</b>.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

CPU, Memory, Disk, Bandwith: which one is usually the bottleneck?

A

That depends, but it is only one of them! Almost never two or more!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When should we be using systems such as Spark or MapReduce?

A

When system I/O is the limit of the system! This is because on one system disk is limiting us and we need parallel writes and reads!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are two different definitions of latency?

A

Some refer to latency as the time when data starts arriving, and some to the time when all of the data arrives (= + delivery time).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Prefix 0.001 in words?

A

Milli (m).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Prefix 0.000 001 in words?

A

Micro (mu).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Prefix 0.000 000 001 in words?

A

Nano (n).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Prefix 0.000 000 000 001 in words?

A

Pico (p).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is latency of one instruction execution on CPU?

A

1 ns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the latency of fetch from main memory?

A

100 ns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the latency of fetching from new disk location?

A

8 ms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the latency of reading internet packet to US and back?

A

150 ms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the throughput of standard Fast Ethernet?

A

100 Mbit/s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the throughput of write/read to SSD?

A

I/O 240/270 MB/s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the definition of total response time?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is speedup?

A

Old latency / new latency.

17
Q

What is Amdahl’s law for speed up?

A

For constant problem size<br></br>1/(1 - p +p/s)<br></br>where <i>p </i>is the fraction of how much we can parallelize and <i>s</i> speedup on parallelizable part.

18
Q

What is Gustafson’s law for speedup?

A

For constant computing power<br></br>1 - p + sp<br></br>where <i>p </i>is the fraction of how much we can parallelize and <i>s</i> speedup on parallelizable part.

19
Q

How to avoid memory scale up?

A

Optimize classes that are instantiated billions of times, remove redundancy and inheritance, unused fields and use efficient data structures.

20
Q

How to avoid Disk I/O scale up?

A

Use efficient formats and compression!

21
Q

How to avoid I/O scale up?

A

Push down computation closest to the data.<br></br><br></br>Batch process (send larger packets when possible).

22
Q

What is a good rule of thumb to how many nodes we should have in a system? Why?

A

Usually around the square root of the number of available cores and memory capacity.<br></br><br></br>We want liquidity of tasks in slots, but at the same time not to overload the network in with extreme amount of communication.

23
Q

Describe the phenomenon of tail latency.

A

With rising number of tasks the probability of some task taking a longer than expected time goes to 1.

24
Q

What is the reason for tail latency phenomena?

A

Queues, power limits (hyperthreading), garbage collection and energy management.

25
Q

What formula describes the probability of some node taking more super long?

A

Probability of taking long for one node is <i>p</i>.<br></br><br></br>Probability of not taking long is <i>1 - p.</i><br></br><br></br>Hence probability of all not taking long is <i>(1 - p)^n</i>.<br></br><br></br>Hence probability of all taking long is <i>1 - (1 - p)^n</i>.<br></br><br></br>With <i>p < 1</i> this will go to 1 with <i>n</i> going to infinity.

26
Q

How to solve tail latency?

A

<ul><li>Hedge request: duplicate calls, terminate when one finishes (duplicates load tho),</li><li>Deffered hedge requests: if a task take more than 95% of task usually take, restart it (increases load in practice by 2% and improves tail 10 times).</li></ul>

27
Q

How to avoid CPU scale up?

A

Remove gigantic loops, do not override, do not use classes, do not cast and use exceptions.