Exercises Flashcards
A program is executed for 1 sec, on a processor with a clock cycle of 50 nsec and Throughput1 = 15 MIPS.
- How much is the CPI1, for the program?
- Let us assume that, given some optimization techniques, the throughput of the program is optimized. In the new case, the 40% of the program instructions is executed with CPI = 1, while the fraction of remaining instructions (60%) is executed with the same CPI.
How much is the SpeedUp from the case (1) to the case (2)?
How much is the Throughput2 expressed in MIPS?
Solution (1)
TCLOCK= 50 nsec => fCLOCK = 1/TCLOCK = 20 MHz
CPI1 = fCLOCK / MIPS1 10^6 = 20 10^6 /15 10^6 = 1,33
Solution (2)
FE = 0,40
SpeedUpE = CPI1/CPIE = 1,33 / 1 = 1,33
SpeedUp = 1 / [(1-FE) + FE /SpeedUpE] = 1 / (0,6 + 0,4 / 1,33) = 1,11
SpeedUp = MIPS2 /MIPS1=> MIPS2 = SpeedUp MIPS1 = 1,11 * 15 = 16,65
A program is executed for 1 sec, on a processor with a clock cycle of 100 nsec and CPI1 = 1,5.
- How much is the Throughput1 expressed in MIPS?
- Let us assume that, given some optimization techniques, the 30% of the program instructions is executed with CPI = 1, while the fraction of remaining instructions (70%) is executed with the same CPI.
How much is the Throughput expressed in MIPS?
How much is the SpeedUp from the case (1) to the case (2)?
Solution (1)
TCLOCK= 100 nsec => fCLOCK = 1/TCLOCK = 10 MHz
MIPS1 = fCLOCK / CPI1 106 = 10 106 /1,5 106 = 6,66
Solution (2)
FE = 0,30
SpeedUpE = CPI1/CPIE = 1,5 / 1 = 1,5
SpeedUp = 1 / [(1-FE) + FE /SpeedUpE] = 1 / (0,7 + 0,3 / 1,5) = 1,11
SpeedUp = MIPS2 /MIPS1 => MIPS2 = SpeedUp MIPS1 = 1,11 * 6,66 = 7,4
A program is executed for 1 sec, on a processor with a clock cycle of 50 nsec and Throughput1 = 10 MIPS.
- How much is the CPI1, for the program?
- Let us assume that, thanks to the introduction of a superscalar processor, the throughput of the program is optimized. In the new case, the 50% of the program instructions is executed with 3 parallel issues, while the fraction of remaining instructions (50%) is executed with one issue.
How much is the SpeedUp from the case (1) to the case (2)?
How much is the Throughput2 expressed in MIPS?
Solution (1)
TCLOCK= 50 nsec => fCLOCK = 1/TCLOCK = 20 MHz
CPI1 =fCLOCK /MIPS1 10^6 =20 10^6 /10 10^6 = 2
Solution (2)
FE = 0,50
SpeedUpE = ThE/Th1 = 3 Th1 / Th1 = 3
SpeedUp = 1 / [(1-FE) + FE /SpeedUpE] = 1 / (0,5 + 0,5 / 3) = 1,5
SpeedUp = MIPS2 /MIPS1=> MIPS2 =SpeedUpMIPS1 =1,5*10=15
Let us consider a computer executing the following mix of instructions:
Istruction|Frequency|Clock Cycles
ALU |50 |1
LOAD |20 |5
STORE |10 |3
BRANCH |20 |2
- How much is the CPI average (1) assuming a clock period of 5 ns ?
How much is the Throughput expressed in MIPS, in the case (1)? - How much is the SpeedUp assuming that, introducing an optimized data cache, load instructions require 2 clock cycles?
- How much is the SpeedUp assuming that, introducing an optimized branch unit, branch instructions require 1 clock cycles?
- How much is the SpeedUp assuming to introduce 2 ALUs working in parallel?
- How much is the SpeedUp assuming to introduce all together the above optimizations?
(1)
CPI1 = CPI1 ave = 0.5 * 1 + 0.2 * 5 + 0.1 * 3 + 0.2* 2 = 2.2
MIPS1 = fCLOCK /(CPI1 * 106) = (200 * 106 ) / (2.2 * 106 ) = 90.90
(2)
CPI2 = CPI2 average = 0.5 * 1 + 0.2 * 2 + 0.1 * 3 + 0.2* 2 = 1.6
Speedup = CPI1 / CPI2 = 2,2 / 1,6 = 1,375
(3)
CPI3 = CPI3 average = 0.5 * 1 + 0.2 * 5 + 0.1 * 3 + 0.2* 1 = 2
Speedup=CPI1 /CPI3 =2,2/2 =1,1
(4)
CPI4 = CPI4 average = 0.5 * 0,5 + 0.2 * 5 + 0.1 * 3 + 0.2* 2 = 1,95
Speedup = CPI1 / CPI4 = 2,2 / 1,95 = 1,13
(5)
CPI4 = CPI4 average = 0.5 * 0,5 + 0.2 * 2 + 0.1 * 3 + 0.2* 1 = 1,15
Speedup = CPI1 / CPI4 = 2,2 / 1,15 = 1,91
Let us consider a computer executing the following mix of instructions:
Instrcution|Frequency| Clock cycles
ALU |50 |1
LOAD |20 |4
STORE |10 |4
BRANCH |10 |2
JUMP |10 | 2
- How much is the CPI average (1) assuming a clock period of 5 ns ?
- Let us assume that, given some opimisation techniques, the clock frequency has been incremented by 25% and this implies a CPI increment of ALU instructions of 50% and LOAD instructions of 25% while the remaining instructions are executed with the same CPI.
How much is the Throughput expressed in MIPS, in the case (2)? - How much is the Speedup from (1) to (2)?
Is it better the case (1) or the case (2)?
(1)
CPI1 = CPIaverage = 0.5 * 1 + 0.2 * 4 + 0.1 * 4 + 0.1* 2 + 0.1 * 2 = 2.1
MIPS1 = fCLOCK /(CPI1 * 106) = (200 * 106 ) / (2.1 * 106 ) = 95.24
(2)
CPI2 = CPIaverage = 0.5 * 1.5 + 0.2 * 5 + 0.1 * 4 + 0.1* 2 + 0.1 * 2 = 2.55
fclock2 = 1,25 fclock1 = 250 MHz
MIPS2 = fCLOCK /(CPI2 * 106) = (250 * 106 ) / (2.55 * 106 ) = 98.04
(3)
Speedup = MIPS2 / MIPS1 = 98,04 / 95,24 = 1,03
It is better the case (2)
Notice that the Speedup can also be calculated by comparing the execution times taking into account that:
Tclock2 = 0,8 Tclock1 = 4 ns:
TCPU1 = IC1 CPI1 Tclock1 = 100 * 2,1 * 5 ns = 1050 ns
TCPU2 =IC2 CPI2 Tclock2 =1002,554ns=1020ns
Speedup = TCPU1 / TCPU2 = 1050 / 1020 = 1,03
Note: It was not possible to calculate the speedup by comparing the CPIs because the clock frequencies were diff
Let us consider a single iteration of the loop executed by 5-stage pipelined MIPS processor with optimized pipeline, where in the Register File, it is possible to read and write at the same address at the same clock cycle;
lw $2, BASEA ($4)
addi $2, $2, INC1
lw $3, BASEB ($4)
addi $3, $3, INC2
add $5,$2,$3
sw $5, BASEC ($4)
addi $4, $4, 4
bne $4,$7,L1
Define the Data Hazards and their type and Control Hazards
lw $2, BASEA ($4)
addi $2, $2, INC1 - RAW $2
lw $3, BASEB ($4)
addi $3, $3, INC2 - RAW $3
add $5,$2,$3 - RAW $2 & $3
sw $5, BASEC ($4) - ?
addi $4, $4, 4 - ?
bne $4,$7,L1 - Control Hazard
Let us consider a single iteration of the loop executed by 5-stage pipelined MIPS processor with optimized pipeline, where in the Register File, it is possible to read and write at the same address at the same clock cycle;
lw $2, BASEA ($4)
addi $2, $2, INC1
lw $3, BASEB ($4)
addi $3, $3, INC2
add $5,$2,$3
sw $5, BASEC ($4)
addi $4, $4, 4
bne $4,$7,L1
Insert in the following pipeline scheme the STALLS for each stage needed to solve the previous data and control hazards
lw $2, BASEA ($4)
addi $2, $2, INC1 - 2 stalls
lw $3, BASEB ($4)
addi $3, $3, INC2 - 2 stalls
add $5,$2,$3 - 2 stalls
sw $5, BASEC ($4) - 2 stalls?
addi $4, $4, 4
bne $4,$7,L1 - 2 stalls?
Next Instruction - 3 stalls?
Let us consider a single iteration of the loop executed by 5-stage pipelined MIPS processor with optimized pipeline, where in the Register File, it is possible to read and write at the same address at the same clock cycle;
lw $2, BASEA ($4)
addi $2, $2, INC1 - 2 stalls
lw $3, BASEB ($4)
addi $3, $3, INC2 - 2 stalls
add $5,$2,$3 - 2 stalls
sw $5, BASEC ($4) - 2 stalls
addi $4, $4, 4
bne $4,$7,L1 - 2 stalle
Next Instruction - 3 stalls
Express the formulas, then calculate the following metrics:
• Instruction Count per iteration
• Number of stalls per iteration • CPI per iteration • Throughput (expressed in MIPS) per iteration • Asymptotic CPI (N cycles) • Asymptotic Throughput (expressed in MIPS) (N cycles)
Express the formulas, then calculate the following metrics:
• Instruction Count per iteration
(IC) = 8
• Number of stalls per iteration = 13 • CPI per iteration: CPI = Number of clock cycles / IC = (IC+ # stalls + 4) /IC = 25 / 8 = 3.125 • Throughput (expressed in MIPS) per iteration: MIPS = fCLOCK / (CPI * 106 ) = (500 * 106) / (3.125 * 106) = 160 • Asymptotic CPI (N cycles): CPI AS = (IC + # stalls) / IC = (8 + 13) / 8 = 2.625 • Asymptotic Throughput (expressed in MIPS) (N cycles): MIPSAS = fCLOCK / (CPIAS * 106 ) = (500 * 106) / (2.625 * 106) = 190
Let us consider a single iteration of the loop executed by 5-stage pipelined MIPS processor with optimized pipeline, where in the Register File, it is possible to read and write at the same address at the same clock cycle;
lw $2, BASEA ($4)
addi $2, $2, INC1
lw $3, BASEB ($4)
addi $3, $3, INC2
add $5,$2,$3
sw $5, BASEC ($4)
addi $4, $4, 4
bne $4,$7,L1
Let us assume the following optimizations have been introduced in the pipeline:
- Forwarding paths;
- Early evaluation of branch in the ID stage;
Insert in the following pipeline scheme the stalls needed to solve the hazards by marking in GREEN the forwarding paths used;
lw $2, BASEA ($4)
addi $2, $2, INC1 - 1 stall + ME => EX $2
lw $3, BASEB ($4)
addi $3, $3, INC2 - 1 stall + EX => EX $3
add $5,$2,$3 - EX => EX $3
sw $5, BASEC ($4) - EX => EX $5
addi $4, $4, 4
bne $4,$7,L1 - 1 stall + EX => ID $4
Next Instruction - 1 stall
Let us consider a single iteration of the loop executed by 5-stage pipelined MIPS processor with optimized pipeline, where in the Register File, it is possible to read and write at the same address at the same clock cycle;
lw $2, BASEA ($4)
addi $2, $2, INC1 - 1 stall + ME => EX $2
lw $3, BASEB ($4)
addi $3, $3, INC2 - 1 stall + EX => EX $3
add $5,$2,$3 - EX => EX $3
sw $5, BASEC ($4) - EX => EX $5
addi $4, $4, 4
bne $4,$7,L1 - 1 stall + EX => ID $4
Next Instruction - 1 stall
Express the formulas, then calculate the following metrics:
• Instruction Count per iteration (IC) • Number of stalls per iteration • CPI per iteration
• Throughput (expressed in MIPS) per iteration
• Asymptotic CPI (N cycles)
• Asymptotic Throughput (expressed in MIPS) (N cycles)
Express the formulas, then calculate the following metrics:
• Instruction Count per iteration (IC) = 8 • Number of stalls per iteration = 4 • CPI per iteration: CPI = # cycles / IC = (IC+ # stalls + 4) /IC = 16 / 8 = 2
• Throughput (expressed in MIPS) per iteration:
MIPS = fCLOCK / (CPI * 106 ) = (500 * 106) / (2 * 106) = 250
• Asymptotic CPI (N cycles) : CPI AS= (IC+ # stalls) / IC = (8 + 4) / 8 = 1,5
• Asymptotic Throughput (expressed in MIPS) (N cycles): MIPSAS = fCLOCK / (CPIAS * 106 ) = (500 * 106) / (1,5 * 106) = 333,33
Let us consider a computer executing the following mix of instructions:
Instrcution|Frequency| Clock cycles
ALU |50 |1
LOAD |20 |4
STORE |10 |4
BRANCH |10 |2
JUMP |10 | 2
1.
How much is the CPI average (1) assuming a clock frequency of 500 MHz?
How much is the Throughput expressed in MIPS, in the case (1)?
2.
Let us assume that, given some opimisation techniques, the 30% of program instructions is executed with CPIE = 1.05 and the remaining fraction of instructions (70%) is executed with the same CPI calculated in the case (1). How much is the Speedup from (1) to (2)?
How much is the Throughput expressed in MIPS, in the case (2)?
1.
CPI1 = CPIaverage = 0.5 * 1 + 0.2 * 4 + 0.1 * 4 + 0.1* 2 + 0.1 * 2 = 2.1
MIPS1 = fCLOCK /(CPI1 * 106 ) = (500 * 106 ) / (2.1 * 106 ) = 238
FE = 0.3; SpeedupE = CPI1 / CPIE = 2; for the Amdahl’s Law:
Speedup = 1 / [(1-FE) + ( FE/SpeedupE)] = 1 / [ 0.7 + (0.3 / 2)]= 1, 176
MIPS2 = Speedup * MIPS1 = 1.176 * 238 = 279,88
Let us consider a computer executing the following mix of instructions:
Instrcution|Frequency| Clock cycles
ALU |50 |1
LOAD |20 |4
STORE |10 |4
BRANCH |10 |2
JUMP |10 | 2
How much is the CPI average (1) assuming a clock frequency of 500 MHz?
How much is the Throughput expressed in MIPS, in the case (1)?
2.
Let us assume that, given a HW opimisation technique, the 40% of instructions of the program is executed with CPIE = 1.05 and the remaining fraction of instructions (60%) is executed with the same CPI calculated in the case (1). How much is the Speedup from (1) to (2)?
How much is the Throughput expressed in MIPS, in the case (2)?
Let us assume that, given a HW opimisation technique, branch and jump instructions require only a single clock cycle. How much is the Speedup from (1) to (3)?
How much is the Throughput expressed in MIPS, in the case (3)?
- Is it better the optimisation introduced in (2) or in (3) ?
CPI1 = CPIaverage = 0.5 * 1 + 0.2 * 4 + 0.1 * 4 + 0.1* 2 + 0.1 * 2 = 2.1
MIPS1 = fCLOCK /(CPI1 * 106 ) = (500 * 106 ) / (2.1 * 106 ) = 238
2.
FE = 0.4; SpeedupE = CPI1 / CPIE = 2,1/1,05 =2;
For the Amdahl’s Law:
Speedup = 1 / [(1-FE) + ( FE/SpeedupE)] = 1 / [ 0.6 + (0.4 / 2)]= 1, 25
MIPS2 = Speedup * MIPS1 = 1.25 * 238 = 297,5
3.
CPI3 = CPIaverage = 0.5 * 1 + 0.2 * 4 + 0.1 * 4 + 0.1* 1 + 0.1 * 1 = 1,9
Speedup = CPI1 / CPI3 = 2,1/1,9 =1,1;
MIPS3 = Speedup * MIPS1 = 1,1 * 238 = 261,8
4.
The optimisation (2) is better.
Let us consider a computer A executing an application containing 30% of load/store instructions requiring 1 clock cycle (thanks to an instruction cache with 100% hit rate). Let us consider an optimized computer B with a clock frequency 5% faster than A and executing 30% less load/store instructions. How much is the Speedup?
TCPU = IC * CPI * Tclock
fclockB = 1.05 fclockA
TclockB = 0.95 TclockA
ICB = (1 – 0.3 * 0.3) ICA = 0,91 ICA
SpeedUp = TCPUA / TCPUB = (ICA * CPIA * TclockA )/( ICB * CPIB * Tclock B ) =
= (ICA * CPIA * TclockA ) / ( 0.91 ICA * CPIA * 0,95 Tclock A) = 1 /(0.91 * 0,95 ) = 1.16
Solve the pictures 7 and 8 in the ACA album
Solve pictures 9 and 10 in the ACA album