Metrics and Evaluation Flashcards
Latency
The time it takes to get from starting a task to finishing it.
Throughput
How many things we can do per second. (or over some time period?)
Throughput = 1 / Latency ?
Not always! Pipelining, for example, can significantly improve throughput over 1 / Latency
Two metrics of performance
Latency and throughput
Speedup
Speedup = N where “X is N times faster than Y”
i.e., N = Speed(X) / Speed(Y) = Throughput(X) / Throughput(Y)
For latency, N = Latency(Y)/Lateny(X)
Performance as a function of latency and throughput
Performance is proportional to throughput, and proportional to 1/Latency
What are the types of benchmarks?
- Real applications
- Kernels
- Synthetic benchmarks
- Peak performance
Pros and cons of real applications for benchmarking?
Pro: Most representative
Con: Most difficult to set up, you need an OS, graphics cards, etc, and these might not all be set up
What is a benchmark?
A set of programs and input data agreed upon for performance measurement
What is an application kernel (re: benchmarking)? Pros and cons?
Only the most time-consuming part of an application. Pro: Better than using real applications
Con: but you still might be missing things that are necessary for running the kernel, i.e., a compiler
What is a synthetic benchmark?
Designed to emulate an application kernel, but is simpler to compile
What is peak performance (re: benchmarking)?
At best, how many instructions per second? This is really only good for marketing!
When should we use each type of benchmark?
When designing a new machine, synthetic benchmarks. We can run them at the design stage because they have minimal requirements, enabling us to choose the best design. However, they’re not great for reporting performance to others.
Once a prototype is built, we can use kernels!
Once we have a real machine that we are trying to sell someone, we should use real applications.
How do you compute the average of several speedup calculations? How do you NOT compute it?
Do NOT use the arithmetic mean! This is because speedups are ratios.
Say we get our speedup of 2 for one app because throughput changes from 2 to 4 and a speedup of 8 for another app because throughput changes from 1 to 8. Ideally we would calculate average speedup directly from these throughputs like this:
(8 + 4) / (1 + 2) = 12/3 = 4
Note this is NOT equal to the average of average speedups, (2 + 8) / 2 = 5.
If we only have the average speedups, we can use their geometric mean, i.e., for n speedups, the geometric mean = the nth root of the product of all n speedups. Geometric mean of 2 and 8 is sqrt(2*8) = 4.
We can use it because the speedup of geometric means of throughputs = the geometric mean of speedups.
sqrt(48)/sqrt(21) = 4 = sqrt(2*8)
Iron Law of Performance
CPU Time = # of instructions in program * CPI * clock cycle time, because:
instructions/program * cycles/instruction * seconds/cycle = seconds/program (after canceling cycles and instructions)
What determines the values of each component in the Iron Law of Performance/CPU time?
Number of instructions in program: algorithm, compiler, instruction set
CPI: instruction set, processor design
clock cycle time: processor design, circuit design, transistor physics
How does computer architecture affect the Iron Law of Performance/CPU time?
Via instruction set and processor design.
Instruction set: more complex instructions will reduce # of instructions, but increase CPI. Vice versa for simple instructions.
Processor design: We can have processor with short clock cycle at the expense of greater CPI, or longer clock cycle with lower CPI.
A good design balances these tradeoffs.
Iron Law for Unequal Instruction Times (different CPIs for different instruction types)
We need to combine instructions/program and cycles/instruction into:
The sum over all instructions of (the number of instructions of each type) * (CPI for that instruction type)
Amdahl’s Law
We use this to calculate overall speedup when only some instructions are affected by a change.
overall speedup = 1/ ((1-FRAC_enh) + FRAC_enh/speedup_enh)
where FRAC_enh is percentage of execution TIME affected by the enhancement. TIME ONLY, not # of instructions.
Implications of Amdahl’s Law
It’s usually better to have a small speedup on a larger percentage of execution time than a large speedup on a small percentage, i.e. MAKE THE COMMON CASE FAST!
Imagine an infinite speedup on a small portion of execution time to understand why (there is a hard cap on overall speedup)
Lhadma’s Law
While Amdahl tells us to make the common case fast, we shouldn’t mess up the uncommon case too much!
While percent of execution time caps the benefit of any speedup, there is no such floor for slowdowns! If you slow down a small portion enough, you can make the program take arbitrarily long.
What happens if we keep speeding up the same part of program?
Diminishing returns! After each improvement, that portion of the program becomes a smaller percentage of the total, so the overall speedups become smaller.