Metrics and Evaluation Flashcards
Latency
The time it takes to get from starting a task to finishing it.
Throughput
How many things we can do per second. (or over some time period?)
Throughput = 1 / Latency ?
Not always! Pipelining, for example, can significantly improve throughput over 1 / Latency
Two metrics of performance
Latency and throughput
Speedup
Speedup = N where “X is N times faster than Y”
i.e., N = Speed(X) / Speed(Y) = Throughput(X) / Throughput(Y)
For latency, N = Latency(Y)/Lateny(X)
Performance as a function of latency and throughput
Performance is proportional to throughput, and proportional to 1/Latency
What are the types of benchmarks?
- Real applications
- Kernels
- Synthetic benchmarks
- Peak performance
Pros and cons of real applications for benchmarking?
Pro: Most representative
Con: Most difficult to set up, you need an OS, graphics cards, etc, and these might not all be set up
What is a benchmark?
A set of programs and input data agreed upon for performance measurement
What is an application kernel (re: benchmarking)? Pros and cons?
Only the most time-consuming part of an application. Pro: Better than using real applications
Con: but you still might be missing things that are necessary for running the kernel, i.e., a compiler
What is a synthetic benchmark?
Designed to emulate an application kernel, but is simpler to compile
What is peak performance (re: benchmarking)?
At best, how many instructions per second? This is really only good for marketing!
When should we use each type of benchmark?
When designing a new machine, synthetic benchmarks. We can run them at the design stage because they have minimal requirements, enabling us to choose the best design. However, they’re not great for reporting performance to others.
Once a prototype is built, we can use kernels!
Once we have a real machine that we are trying to sell someone, we should use real applications.
How do you compute the average of several speedup calculations? How do you NOT compute it?
Do NOT use the arithmetic mean! This is because speedups are ratios.
Say we get our speedup of 2 for one app because throughput changes from 2 to 4 and a speedup of 8 for another app because throughput changes from 1 to 8. Ideally we would calculate average speedup directly from these throughputs like this:
(8 + 4) / (1 + 2) = 12/3 = 4
Note this is NOT equal to the average of average speedups, (2 + 8) / 2 = 5.
If we only have the average speedups, we can use their geometric mean, i.e., for n speedups, the geometric mean = the nth root of the product of all n speedups. Geometric mean of 2 and 8 is sqrt(2*8) = 4.
We can use it because the speedup of geometric means of throughputs = the geometric mean of speedups.
sqrt(48)/sqrt(21) = 4 = sqrt(2*8)
Iron Law of Performance
CPU Time = # of instructions in program * CPI * clock cycle time, because:
instructions/program * cycles/instruction * seconds/cycle = seconds/program (after canceling cycles and instructions)