Exercises Flashcards

Question 1

Q

A program is executed for 1 sec, on a processor with a clock cycle of 50 nsec and Throughput1 = 15 MIPS.

How much is the CPI1, for the program?
Let us assume that, given some optimization techniques, the throughput of the program is optimized. In the new case, the 40% of the program instructions is executed with CPI = 1, while the fraction of remaining instructions (60%) is executed with the same CPI.

How much is the SpeedUp from the case (1) to the case (2)?
How much is the Throughput2 expressed in MIPS?

Answer

A

Solution (1)

TCLOCK= 50 nsec => fCLOCK = 1/TCLOCK = 20 MHz

CPI1 = fCLOCK / MIPS1 10^6 = 20 10^6 /15 10^6 = 1,33

Solution (2)

FE = 0,40
SpeedUpE = CPI1/CPIE = 1,33 / 1 = 1,33

SpeedUp = 1 / [(1-FE) + FE /SpeedUpE] = 1 / (0,6 + 0,4 / 1,33) = 1,11

SpeedUp = MIPS2 /MIPS1=> MIPS2 = SpeedUp MIPS1 = 1,11 * 15 = 16,65

Question 2

Q

A program is executed for 1 sec, on a processor with a clock cycle of 100 nsec and CPI1 = 1,5.

How much is the Throughput1 expressed in MIPS?
Let us assume that, given some optimization techniques, the 30% of the program instructions is executed with CPI = 1, while the fraction of remaining instructions (70%) is executed with the same CPI.

How much is the Throughput expressed in MIPS?
How much is the SpeedUp from the case (1) to the case (2)?

Answer

A

Solution (1)

TCLOCK= 100 nsec => fCLOCK = 1/TCLOCK = 10 MHz

MIPS1 = fCLOCK / CPI1 106 = 10 106 /1,5 106 = 6,66

Solution (2)

FE = 0,30
SpeedUpE = CPI1/CPIE = 1,5 / 1 = 1,5

SpeedUp = 1 / [(1-FE) + FE /SpeedUpE] = 1 / (0,7 + 0,3 / 1,5) = 1,11

SpeedUp = MIPS2 /MIPS1 => MIPS2 = SpeedUp MIPS1 = 1,11 * 6,66 = 7,4

Question 3

Q

A program is executed for 1 sec, on a processor with a clock cycle of 50 nsec and Throughput1 = 10 MIPS.

How much is the CPI1, for the program?
Let us assume that, thanks to the introduction of a superscalar processor, the throughput of the program is optimized. In the new case, the 50% of the program instructions is executed with 3 parallel issues, while the fraction of remaining instructions (50%) is executed with one issue.

How much is the SpeedUp from the case (1) to the case (2)?
How much is the Throughput2 expressed in MIPS?

Answer

A

Solution (1)

TCLOCK= 50 nsec => fCLOCK = 1/TCLOCK = 20 MHz

CPI1 =fCLOCK /MIPS1 10^6 =20 10^6 /10 10^6 = 2

Solution (2)

FE = 0,50
SpeedUpE = ThE/Th1 = 3 Th1 / Th1 = 3

SpeedUp = 1 / [(1-FE) + FE /SpeedUpE] = 1 / (0,5 + 0,5 / 3) = 1,5

SpeedUp = MIPS2 /MIPS1=> MIPS2 =SpeedUpMIPS1 =1,5*10=15

Question 4

Q

Let us consider a computer executing the following mix of instructions:
Istruction|Frequency|Clock Cycles
ALU |50 |1
LOAD |20 |5
STORE |10 |3
BRANCH |20 |2

How much is the CPI average (1) assuming a clock period of 5 ns ?
How much is the Throughput expressed in MIPS, in the case (1)?
How much is the SpeedUp assuming that, introducing an optimized data cache, load instructions require 2 clock cycles?
How much is the SpeedUp assuming that, introducing an optimized branch unit, branch instructions require 1 clock cycles?
How much is the SpeedUp assuming to introduce 2 ALUs working in parallel?
How much is the SpeedUp assuming to introduce all together the above optimizations?

Answer

A

(1)

CPI1 = CPI1 ave = 0.5 * 1 + 0.2 * 5 + 0.1 * 3 + 0.2* 2 = 2.2

MIPS1 = fCLOCK /(CPI1 * 106) = (200 * 106 ) / (2.2 * 106 ) = 90.90

(2)

CPI2 = CPI2 average = 0.5 * 1 + 0.2 * 2 + 0.1 * 3 + 0.2* 2 = 1.6
Speedup = CPI1 / CPI2 = 2,2 / 1,6 = 1,375

(3)

CPI3 = CPI3 average = 0.5 * 1 + 0.2 * 5 + 0.1 * 3 + 0.2* 1 = 2
Speedup=CPI1 /CPI3 =2,2/2 =1,1

(4)

CPI4 = CPI4 average = 0.5 * 0,5 + 0.2 * 5 + 0.1 * 3 + 0.2* 2 = 1,95
Speedup = CPI1 / CPI4 = 2,2 / 1,95 = 1,13

(5)

CPI4 = CPI4 average = 0.5 * 0,5 + 0.2 * 2 + 0.1 * 3 + 0.2* 1 = 1,15
Speedup = CPI1 / CPI4 = 2,2 / 1,15 = 1,91

Question 5

Q

Let us consider a computer executing the following mix of instructions:

Instrcution|Frequency| Clock cycles
ALU |50 |1
LOAD |20 |4
STORE |10 |4
BRANCH |10 |2
JUMP |10 | 2

How much is the CPI average (1) assuming a clock period of 5 ns ?
Let us assume that, given some opimisation techniques, the clock frequency has been incremented by 25% and this implies a CPI increment of ALU instructions of 50% and LOAD instructions of 25% while the remaining instructions are executed with the same CPI.
How much is the Throughput expressed in MIPS, in the case (2)?
How much is the Speedup from (1) to (2)?
Is it better the case (1) or the case (2)?

Answer

A

(1)

CPI1 = CPIaverage = 0.5 * 1 + 0.2 * 4 + 0.1 * 4 + 0.1* 2 + 0.1 * 2 = 2.1

MIPS1 = fCLOCK /(CPI1 * 106) = (200 * 106 ) / (2.1 * 106 ) = 95.24

(2)

CPI2 = CPIaverage = 0.5 * 1.5 + 0.2 * 5 + 0.1 * 4 + 0.1* 2 + 0.1 * 2 = 2.55

fclock2 = 1,25 fclock1 = 250 MHz
MIPS2 = fCLOCK /(CPI2 * 106) = (250 * 106 ) / (2.55 * 106 ) = 98.04

(3)

Speedup = MIPS2 / MIPS1 = 98,04 / 95,24 = 1,03

It is better the case (2)

Notice that the Speedup can also be calculated by comparing the execution times taking into account that:

Tclock2 = 0,8 Tclock1 = 4 ns:

TCPU1 = IC1 CPI1 Tclock1 = 100 * 2,1 * 5 ns = 1050 ns
TCPU2 =IC2 CPI2 Tclock2 =1002,554ns=1020ns

Speedup = TCPU1 / TCPU2 = 1050 / 1020 = 1,03

Note: It was not possible to calculate the speedup by comparing the CPIs because the clock frequencies were diff

Question 6

Q

Let us consider a single iteration of the loop executed by 5-stage pipelined MIPS processor with optimized pipeline, where in the Register File, it is possible to read and write at the same address at the same clock cycle;

lw $2, BASEA ($4)
addi $2, $2, INC1
lw $3, BASEB ($4)
addi $3, $3, INC2
add $5,$2,$3
sw $5, BASEC ($4)
addi $4, $4, 4
bne $4,$7,L1

Define the Data Hazards and their type and Control Hazards

Answer

A

lw $2, BASEA ($4)
addi $2, $2, INC1 - RAW $2
lw $3, BASEB ($4)
addi $3, $3, INC2 - RAW $3
add $5,$2,$3 - RAW $2 & $3
sw $5, BASEC ($4) - ?
addi $4, $4, 4 - ?
bne $4,$7,L1 - Control Hazard

Question 7

Q

Let us consider a single iteration of the loop executed by 5-stage pipelined MIPS processor with optimized pipeline, where in the Register File, it is possible to read and write at the same address at the same clock cycle;

lw $2, BASEA ($4)
addi $2, $2, INC1
lw $3, BASEB ($4)
addi $3, $3, INC2
add $5,$2,$3
sw $5, BASEC ($4)
addi $4, $4, 4
bne $4,$7,L1

Insert in the following pipeline scheme the STALLS for each stage needed to solve the previous data and control hazards

Answer

A

lw $2, BASEA ($4)
addi $2, $2, INC1 - 2 stalls
lw $3, BASEB ($4)
addi $3, $3, INC2 - 2 stalls
add $5,$2,$3 - 2 stalls
sw $5, BASEC ($4) - 2 stalls?
addi $4, $4, 4
bne $4,$7,L1 - 2 stalls?
Next Instruction - 3 stalls?

Question 8

Q

Let us consider a single iteration of the loop executed by 5-stage pipelined MIPS processor with optimized pipeline, where in the Register File, it is possible to read and write at the same address at the same clock cycle;

lw $2, BASEA ($4)
addi $2, $2, INC1 - 2 stalls
lw $3, BASEB ($4)
addi $3, $3, INC2 - 2 stalls
add $5,$2,$3 - 2 stalls
sw $5, BASEC ($4) - 2 stalls
addi $4, $4, 4
bne $4,$7,L1 - 2 stalle
Next Instruction - 3 stalls

Express the formulas, then calculate the following metrics:
• Instruction Count per iteration

• Number of stalls per iteration

• CPI per iteration

• Throughput (expressed in MIPS) per iteration

• Asymptotic CPI (N cycles)

• Asymptotic Throughput (expressed in MIPS) (N cycles)

Answer

A

Express the formulas, then calculate the following metrics:
• Instruction Count per iteration
(IC) = 8

• Number of stalls per iteration = 13

• CPI per iteration:  CPI = Number of clock cycles / IC =  (IC+ # stalls + 4) /IC = 25 / 8 = 3.125

• Throughput (expressed in MIPS) per iteration:  MIPS = fCLOCK / (CPI * 106 ) =  (500 * 106) / (3.125 * 106) = 160

• Asymptotic CPI (N cycles):  CPI AS = (IC + # stalls) / IC = (8 + 13) / 8 = 2.625

• Asymptotic Throughput (expressed in MIPS) (N cycles):  MIPSAS = fCLOCK / (CPIAS * 106 ) = (500 * 106) / (2.625 * 106) = 190

Question 9

Q

Let us consider a single iteration of the loop executed by 5-stage pipelined MIPS processor with optimized pipeline, where in the Register File, it is possible to read and write at the same address at the same clock cycle;

lw $2, BASEA ($4)
addi $2, $2, INC1
lw $3, BASEB ($4)
addi $3, $3, INC2
add $5,$2,$3
sw $5, BASEC ($4)
addi $4, $4, 4
bne $4,$7,L1

Let us assume the following optimizations have been introduced in the pipeline:
- Forwarding paths;
- Early evaluation of branch in the ID stage;

Insert in the following pipeline scheme the stalls needed to solve the hazards by marking in GREEN the forwarding paths used;

Answer

A

lw $2, BASEA ($4)
addi $2, $2, INC1 - 1 stall + ME => EX $2
lw $3, BASEB ($4)
addi $3, $3, INC2 - 1 stall + EX => EX $3
add $5,$2,$3 - EX => EX $3
sw $5, BASEC ($4) - EX => EX $5
addi $4, $4, 4
bne $4,$7,L1 - 1 stall + EX => ID $4
Next Instruction - 1 stall

Question 10

Q

Let us consider a single iteration of the loop executed by 5-stage pipelined MIPS processor with optimized pipeline, where in the Register File, it is possible to read and write at the same address at the same clock cycle;

lw $2, BASEA ($4)
addi $2, $2, INC1 - 1 stall + ME => EX $2
lw $3, BASEB ($4)
addi $3, $3, INC2 - 1 stall + EX => EX $3
add $5,$2,$3 - EX => EX $3
sw $5, BASEC ($4) - EX => EX $5
addi $4, $4, 4
bne $4,$7,L1 - 1 stall + EX => ID $4
Next Instruction - 1 stall

Express the formulas, then calculate the following metrics:

• Instruction Count per iteration  (IC)

• Number of stalls per iteration

• CPI per iteration

• Throughput (expressed in MIPS) per iteration

• Asymptotic CPI (N cycles)

• Asymptotic Throughput (expressed in MIPS) (N cycles)

Answer

A

Express the formulas, then calculate the following metrics:

• Instruction Count per iteration  (IC) = 8

• Number of stalls per iteration = 4
• CPI per iteration: CPI =  # cycles / IC = (IC+ # stalls + 4) /IC = 16 / 8 = 2

• Throughput (expressed in MIPS) per iteration:
MIPS = fCLOCK / (CPI * 106 ) = (500 * 106) / (2 * 106) = 250

• Asymptotic CPI (N cycles) : CPI AS= (IC+ # stalls) / IC = (8 + 4) / 8 = 1,5

• Asymptotic Throughput (expressed in MIPS) (N cycles): MIPSAS = fCLOCK / (CPIAS * 106 ) = (500 * 106) / (1,5 * 106) = 333,33

Question 11

Q

Let us consider a computer executing the following mix of instructions:

Instrcution|Frequency| Clock cycles
ALU |50 |1
LOAD |20 |4
STORE |10 |4
BRANCH |10 |2
JUMP |10 | 2

1.

How much is the CPI average (1) assuming a clock frequency of 500 MHz?
How much is the Throughput expressed in MIPS, in the case (1)?

2.

Let us assume that, given some opimisation techniques, the 30% of program instructions is executed with CPIE = 1.05 and the remaining fraction of instructions (70%) is executed with the same CPI calculated in the case (1). How much is the Speedup from (1) to (2)?
How much is the Throughput expressed in MIPS, in the case (2)?

Answer

A

1.

CPI1 = CPIaverage = 0.5 * 1 + 0.2 * 4 + 0.1 * 4 + 0.1* 2 + 0.1 * 2 = 2.1
MIPS1 = fCLOCK /(CPI1 * 106 ) = (500 * 106 ) / (2.1 * 106 ) = 238

FE = 0.3; SpeedupE = CPI1 / CPIE = 2; for the Amdahl’s Law:
Speedup = 1 / [(1-FE) + ( FE/SpeedupE)] = 1 / [ 0.7 + (0.3 / 2)]= 1, 176

MIPS2 = Speedup * MIPS1 = 1.176 * 238 = 279,88

Question 12

Q

Let us consider a computer executing the following mix of instructions:

Instrcution|Frequency| Clock cycles
ALU |50 |1
LOAD |20 |4
STORE |10 |4
BRANCH |10 |2
JUMP |10 | 2

How much is the CPI average (1) assuming a clock frequency of 500 MHz?
How much is the Throughput expressed in MIPS, in the case (1)?

2.

Let us assume that, given a HW opimisation technique, the 40% of instructions of the program is executed with CPIE = 1.05 and the remaining fraction of instructions (60%) is executed with the same CPI calculated in the case (1). How much is the Speedup from (1) to (2)?
How much is the Throughput expressed in MIPS, in the case (2)?

Let us assume that, given a HW opimisation technique, branch and jump instructions require only a single clock cycle. How much is the Speedup from (1) to (3)?
How much is the Throughput expressed in MIPS, in the case (3)?

Is it better the optimisation introduced in (2) or in (3) ?

Answer

A

CPI1 = CPIaverage = 0.5 * 1 + 0.2 * 4 + 0.1 * 4 + 0.1* 2 + 0.1 * 2 = 2.1
MIPS1 = fCLOCK /(CPI1 * 106 ) = (500 * 106 ) / (2.1 * 106 ) = 238

2.

FE = 0.4; SpeedupE = CPI1 / CPIE = 2,1/1,05 =2;
For the Amdahl’s Law:
Speedup = 1 / [(1-FE) + ( FE/SpeedupE)] = 1 / [ 0.6 + (0.4 / 2)]= 1, 25

MIPS2 = Speedup * MIPS1 = 1.25 * 238 = 297,5

3.

CPI3 = CPIaverage = 0.5 * 1 + 0.2 * 4 + 0.1 * 4 + 0.1* 1 + 0.1 * 1 = 1,9
Speedup = CPI1 / CPI3 = 2,1/1,9 =1,1;

MIPS3 = Speedup * MIPS1 = 1,1 * 238 = 261,8

4.

The optimisation (2) is better.

Question 13

Q

Let us consider a computer A executing an application containing 30% of load/store instructions requiring 1 clock cycle (thanks to an instruction cache with 100% hit rate). Let us consider an optimized computer B with a clock frequency 5% faster than A and executing 30% less load/store instructions. How much is the Speedup?

Answer

A

TCPU = IC * CPI * Tclock

fclockB = 1.05 fclockA

TclockB = 0.95 TclockA

ICB = (1 – 0.3 * 0.3) ICA = 0,91 ICA

SpeedUp = TCPUA / TCPUB = (ICA * CPIA * TclockA )/( ICB * CPIB * Tclock B ) =
= (ICA * CPIA * TclockA ) / ( 0.91 ICA * CPIA * 0,95 Tclock A) = 1 /(0.91 * 0,95 ) = 1.16

Question 14

Q

Solve the pictures 7 and 8 in the ACA album

Question 15

Q

Solve pictures 9 and 10 in the ACA album

Question 16

Q

Solve the pictures 11 and 12 in the ACA album

Answer

Study These Flashcards

A

Question 17

Q

Solve the picture 13 in ACA album

Answer

Study These Flashcards

A

Question 18

Q

Solve the picture 14 in ACA album

Answer

Study These Flashcards

A

Question 19

Q

Solve the picture 15 in ACA album

Answer

Study These Flashcards

A

Question 20

Q

What is the difference between optimizations and forwarding paths for the MIBS pipeline architecture

Answer

Study These Flashcards

A

Optimized: can write and read the RF at the same clock cycle

Forwarding paths: can forward results from EX to ID (in case of branching), ME to EX (in case of load and store operations), and EX to EX

Question 21

Q

Let us consider a memory hierarchy (main memory + cache) given by:
- Memory size 1 Giga words of 16 bit (word addressed);
- Cache size 1 Mega words of 16 bit (word addressed);
- Cache block size 256 words of 16 bit.

Calculate the number of cache blocks;
Calculate the structure of the addresses for the following cache structures:
- direct mapped cache;
- fully associative cache;
- 2-way set-associative cache;
- 4-way set-associative cache;
- 8-way set–associative cache.
Calculate the number of sets for the previous set-associative caches.