Appendix C Flashcards

Question 1

Q

Pipelining

Answer

A

process of overlapping multiple instructions
takes advantage of parallelism
most processors use this to implement parallelism
breaks instruction execution into multiple stages
- usually Instruction Fetch, Instruction decode, Instruction execute, Memory write, Write back
  - also called pipe stages or segments
goal is to increase throuput - number of instructions per time unit
one step inside of a pipeline is called processor cycle

Speed up on ideal machine:
time per instruction on unpipelined machine / number of pipe stages

Question 2

Q

Instruction cycle in Risc-V

Answer

A

5 phases:
IF - instruction fetch:
- send PC to memory and load instruction
- increment PC by 4 (instructions are 4 bytes)
ID - instruction decode:
- decode instruction
- read registers
- resolve branches and possible branch targets
- sign extension (register and immidiate values)
EX - execution/effective address:
- ALU operates on three functions:
- memory reference - ALU adds base register and offset to form effective address
- Register-register ALU instruction - ALU performs operation specified by the ALU opcode on the values in registers
- Register-immidiate ALU instruction - ALU performs operation specified by ALU opcode on value read from register and the sign-extended immidiate value
- Conditional branch - determine whether branch condition is true
- load-store architectures may perform effective address and execution in one cycle, because instructions don`t need to perform operation and calculating data address at the same time
MEM - memory access
- LOAD instruction reads using effective address
- STORE instruction performs store of the element from one register based on the address specified in second register
WB - write-back cycle:
- Register-register instruction
- load instruction
- writes result of the instruction into register

With this model branch instructions require 3 cycles, store instructions 4 cycles, all other instructions require 5 cycles
each stage contains pipeline registers

Question 3

Q

Fixed-field decoding

Answer

A

all registers are in fixed locations
speed-ups decoding of the instructions
used in RISC

Question 4

Q

Hazards

Answer

A

structural hazards - hardware cannot support overlapping multiple instructions
- in modern systems occurs only in slow special purpose units
- can easily be solved by compiler constructors
data hazards - result of one instruction depends on the result of another
control hazards - results from branches and other instructions that change the contents of PC

solving these hazards causes stalls

Question 5

Q

Pipeline performance measurement

Answer

A

CPI pipelined = Ideal CPI + pipeline stalled cycles = 1 + pipelined stalled cycles

Speedup = (CPI unpipelined x clock cycle unpipelined)/(CPI pipelined x Clock cycle pipelined)

Question 6

Q

Data hazards - types

Answer

A

occurs when pipeline changes order of the sequential instructions in unpiplined design

Types:
1. Read after write hazard - instruction j reads x as result from instruction i, which was not written yet. j thus has old value
2. Write after read hazard - instruction i reads value x after instruction j written its result. i thus has wrong value of x
Cannot occur in simple five stage pipeline, but can be produced if instructions are out-of-order
3. Write after write hazard - write of value x by instruction i occurs after write of x by instruction j. x will thus have a wrong value
Cannot occur in simple five stage pipeline, but can be produced if instructions are out-of-order

Question 7

Q

Forwarding(bypassing)

Answer

A

technique used to reduce hazards
result from ALU both from EX/MEM and MEM/WB is fed back to the ALU
if forwarding hardware detects that result from the ALU corresponds to the register as source to the ALU it uses this value instead of value from the register
doesn`t solve all hazards

Question 8

Q

Data hazards with stalls

Answer

A

when data dependency cannot be forwarded (e.g. instruction needs data earlier than it`s actually produced, such as ld instruction dependency)
pipeline interlock - detects hazards and stalls the pipeline
pipeline is stalled until the hazard is solved

Question 9

Q

Branch hazards

Answer

A

control hazard
cause greater performance loss than data hazards
if branch is taken, PC register will change
if branch is untaken, value of PC is not taken
this can cause repetition of IF stage or other different problems (e.g. incorrect instruction is taken)
branch stalls can reduce the performance by 10% to 30%!

Question 10

Q

Reducing branch penalties

Answer

A

simplest solution - freeze or flush instructions after the branch until branch is resolved - hardware simplicity
- causes fixed branch penalty
taking every branch as not taken and performing instructions after the branch and not changing the state of the processor until the branch is solved
- determining when the state might have changed and “back out” from this state
- alternative is to take every branch as taken
delayed branch - fetching next instruction after branch and then fetching instruction as if branch was taken
- effective without branch prediction
- problematic if the branch delay instruction is another branch - many times prohibited!

Question 11

Q

Performance of branch schemes

Answer

A

Pipeline speedup = Pipeline depth / (1 + Branch frequency x Branch penalty)

Question 12

Q

Branch prediction

Answer

A

static - branch predictions are known in compile time

- dynamic - based on program behaviour

Question 13

Q

Static branch prediction

Answer

A

based on the fact that branch prediction is bimodally distributed either in favour of branch taken or branch untaken

Question 14

Q

Dynamic branch predictions

Answer

A

simplest version - branch-prediction buffer, or branch history table

- simple buffer of lower bits of addresses with bit that indicates if the branch there was taken
 - could be modified also by different branch with the same lower bits of address
  - if branch was not taken, then bit is reversed to 0
  - improvement is using 2 bit prediction schemes - prediction must miss twice before it changes - 11 taken, 10 taken, 00 untaken, 01 untaken
   - also more bits can be used, but practice shows that 2 bits are enough

Appendix C Flashcards

(14 cards)