Performance Evaluation Theory Flashcards
How can we analyze if a processor X is faster than a processor Y?
We could either divide the execution time of Y by the execution time off X. or we could divide the performance of X by the performance of Y (performance is one above execution time)
What is the the execution time of a program?
The execution time of a program or the CPU time corresponds to the number of clock cycles times the time each clock cycle takes to complete
Based on how we calculate the CPU time, what are the means to reduce the execution time?
We could either reduce the number of clock cycles per program, reduce the clock period, and increase the clock frequency
What is the other possibility to calculate the CPU time?
The CPU time corresponds to the amount of seconds the processor takes per program. Therefore in order to calculate it, we count the number of instructions per program multiply it by the clocks per instructions and finally by the time each clock cycle takes to complete
How do we calculate the throughout of a processor?
The throughput or MIPS is calculated by the instruction count divided by the execution time times 10 to the 6.
Since the execution time corresponds to the instruction count times the CPI divided by the clock frequency the final way to calculate the MIPS is to divide the clock frequency by the CPI times 10 to 6
How do we evaluate the speed up of a enhancement?
In order to calculate the speed up, we divide the execution time without the enhancement by the execution time with the enhancement or the performance with the enhancement by the performance without the enhancement
How do we calculate the execution time of a processor with a certain enhancement?
The execution time of a possessor with any advancement is equal to the percentage of task that is not affected by enhancement times the previous execution time (the execution time without enhancement) plus the fraction of the task that is affected by the enhancement times the previous execution time divided by the speed up
What is the Amdahl’s law?
The Amdahl’s law says that the performance improvement obtainable from using some faster execution modes is limited by the fraction of the time that the faster mode is used
What is the impact of a pipeline in the throughput and execution time of processor?
Pipelining increases the CPU instruction throughput, but it does not reduce the execution time of a single instruction
Pipelining usually slightly increases the latency of each instructions due to imbalance among the pipeline stages and the overhead in the control of the pipeline
Why does imbalances among by blind stages reduces performance?
Imbalance among pipeline stages, reduces performance since the clock period cannot be shorter than the time needed for these lowest pipe stage.
What is the reason for pipeline overhead?
Pipeline overhead arises from pipeline register delay and clock skew
In a five stage pipeline. How do we account for stall cycles in the throughout calculation?
We add the stall cycles +4 (the amount of cycles to finish the last operation) to the clock cycles in the CPI calculation
What is the difference between calculating it throughout of a processor using the as asymptotic performance metric or not using it?
The difference is located on the CPI calculation, where the final clock cycles in order to finish the last operation execution, goes to zero as the number of instructions go to infinity, and therefore the CPI calculation relates to the amount of instructions and stall cycles
What is the ideal CPI of a pipeline processor, why it is not achievable, and how do we calculate it?
The ideal CPI on a pipeline processor would be one, but still scores the pipeline performance to degrade from the ideal performance
The calculation is: The Ideal CPI plus the stall cycles per instruction
Why does a processor stalls?
Due to structural, data or control hazards, or even due to memory stalls
What is the rational behind the fact that a pipeline processor with Ideal CPI improves the performance by the depth of the pipeline
We calculate the speed up off a pipeline processor by dividing the unpipelined processor by pipeline one.
Since the CPI of a pipeline processor can be expressed as the ideal CPI plus the stall cycle per instruction, in the case of ideal CPI we would have no stalls
And if we assume that the unpipelined processor CPI is equal to the number of stages in the processor (pipeline depth). Our speed up is equal to the pipeline depth.
What is the performance impact of conditional branches?
The pipeline speed up now is equal to the pipeline depth by one plus the stall cycles caused by the branches (which corresponds to the branch frequency times the branch penalty)
What is the impact of memory hierarchy on the CPU time?
The CPU time can be calculated as the IC times the CPI divided by the clock frequency
Why the CPI is calculated as the amount of clocks to execute the instruction plus the store cycles
In order to relate the memory stalls to the stall cycles in a processor, we calculate the average misses per instruction, which corresponds to the percentage of memory accesses done in the program times the miss rate
Finally, we calculate the amount of clock stalls by multiplying the misses per instruction calculated previously by the miss penalty
If machine A executes a program in 10 sec and machine B executes same program in 15 sec:
A is 50% faster than B or A is 33% faster than B?
The statement A is n%faster than B can be expressed as:
• “A is n% faster than B”
execution time (B)/execution time (A) = 1 +n/100 ⇒
n = (execution time (B)- execution time (A)) *100 /
execution time (A)
(15 -10)/10 *100 = 50 ⇒ A is 50% faster than B.
Let us consider an enhancement for a CPU resulting 10 time faster on computation than the original one but the original CPU is busy with computation only 40% of the time. What is the overall speedup gained by introducing the enhancement?
Application of Amdahl’s Law where:
• FractionE = 0.4
• SpeedupE = 10
The overall speed up is given by:
1/((1-0.4) + (0.4/10)) = 1.56
Let us consider a computer with a L1 cache and L2 cache memory hierarchy. Suppose that in 1000 memory references there are 40 misses in L1 and 20 misses in L2.
• What are the various miss rates?
Miss Rate L1 = 40 /1000 = 4% (either local or global)
Miss Rate L2 = 20 /40 = 50%
Global Miss Rate for Last Level Cache (L2): Miss Rate L1 . Miss Rate L2 = 2%