Big Data Lecture 10 Performance at Large Scales Flashcards
With 1000s of nodes in a system, what will be the performance distribution over them?
Most nodes will take around the mean time, but some nodes will take extremely long: phenomenon of <b>tail latency</b>.
CPU, Memory, Disk, Bandwith: which one is usually the bottleneck?
That depends, but it is only one of them! Almost never two or more!
When should we be using systems such as Spark or MapReduce?
When system I/O is the limit of the system! This is because on one system disk is limiting us and we need parallel writes and reads!
What are two different definitions of latency?
Some refer to latency as the time when data starts arriving, and some to the time when all of the data arrives (= + delivery time).
Prefix 0.001 in words?
Milli (m).
Prefix 0.000 001 in words?
Micro (mu).
Prefix 0.000 000 001 in words?
Nano (n).
Prefix 0.000 000 000 001 in words?
Pico (p).
What is latency of one instruction execution on CPU?
1 ns
What is the latency of fetch from main memory?
100 ns
What is the latency of fetching from new disk location?
8 ms
What is the latency of reading internet packet to US and back?
150 ms
What is the throughput of standard Fast Ethernet?
100 Mbit/s
What is the throughput of write/read to SSD?
I/O 240/270 MB/s.
What is the definition of total response time?
NAME?