Appendix C Flashcards

1
Q

Pipelining

A
  • process of overlapping multiple instructions
  • takes advantage of parallelism
  • most processors use this to implement parallelism
  • breaks instruction execution into multiple stages
    • usually Instruction Fetch, Instruction decode, Instruction execute, Memory write, Write back
      - also called pipe stages or segments
  • goal is to increase throuput - number of instructions per time unit
  • one step inside of a pipeline is called processor cycle

Speed up on ideal machine:
time per instruction on unpipelined machine / number of pipe stages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Instruction cycle in Risc-V

A

5 phases:
IF - instruction fetch:
- send PC to memory and load instruction
- increment PC by 4 (instructions are 4 bytes)
ID - instruction decode:
- decode instruction
- read registers
- resolve branches and possible branch targets
- sign extension (register and immidiate values)
EX - execution/effective address:
- ALU operates on three functions:
- memory reference - ALU adds base register and offset to form effective address
- Register-register ALU instruction - ALU performs operation specified by the ALU opcode on the values in registers
- Register-immidiate ALU instruction - ALU performs operation specified by ALU opcode on value read from register and the sign-extended immidiate value
- Conditional branch - determine whether branch condition is true
- load-store architectures may perform effective address and execution in one cycle, because instructions don`t need to perform operation and calculating data address at the same time
MEM - memory access
- LOAD instruction reads using effective address
- STORE instruction performs store of the element from one register based on the address specified in second register
WB - write-back cycle:
- Register-register instruction
- load instruction
- writes result of the instruction into register

  • With this model branch instructions require 3 cycles, store instructions 4 cycles, all other instructions require 5 cycles
  • each stage contains pipeline registers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fixed-field decoding

A
  • all registers are in fixed locations
  • speed-ups decoding of the instructions
  • used in RISC
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hazards

A
  1. structural hazards - hardware cannot support overlapping multiple instructions
    • in modern systems occurs only in slow special purpose units
    • can easily be solved by compiler constructors
  2. data hazards - result of one instruction depends on the result of another
  3. control hazards - results from branches and other instructions that change the contents of PC
  • solving these hazards causes stalls
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Pipeline performance measurement

A

CPI pipelined = Ideal CPI + pipeline stalled cycles = 1 + pipelined stalled cycles

Speedup = (CPI unpipelined x clock cycle unpipelined)/(CPI pipelined x Clock cycle pipelined)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data hazards - types

A
  • occurs when pipeline changes order of the sequential instructions in unpiplined design

Types:
1. Read after write hazard - instruction j reads x as result from instruction i, which was not written yet. j thus has old value
2. Write after read hazard - instruction i reads value x after instruction j written its result. i thus has wrong value of x
Cannot occur in simple five stage pipeline, but can be produced if instructions are out-of-order
3. Write after write hazard - write of value x by instruction i occurs after write of x by instruction j. x will thus have a wrong value
Cannot occur in simple five stage pipeline, but can be produced if instructions are out-of-order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Forwarding(bypassing)

A
  • technique used to reduce hazards
  • result from ALU both from EX/MEM and MEM/WB is fed back to the ALU
  • if forwarding hardware detects that result from the ALU corresponds to the register as source to the ALU it uses this value instead of value from the register
  • doesn`t solve all hazards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data hazards with stalls

A
  • when data dependency cannot be forwarded (e.g. instruction needs data earlier than it`s actually produced, such as ld instruction dependency)
  • pipeline interlock - detects hazards and stalls the pipeline
  • pipeline is stalled until the hazard is solved
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Branch hazards

A
  • control hazard
  • cause greater performance loss than data hazards
  • if branch is taken, PC register will change
  • if branch is untaken, value of PC is not taken
  • this can cause repetition of IF stage or other different problems (e.g. incorrect instruction is taken)
  • branch stalls can reduce the performance by 10% to 30%!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Reducing branch penalties

A
  1. simplest solution - freeze or flush instructions after the branch until branch is resolved - hardware simplicity
    • causes fixed branch penalty
  2. taking every branch as not taken and performing instructions after the branch and not changing the state of the processor until the branch is solved
    • determining when the state might have changed and “back out” from this state
    • alternative is to take every branch as taken
  3. delayed branch - fetching next instruction after branch and then fetching instruction as if branch was taken
    - effective without branch prediction
    - problematic if the branch delay instruction is another branch - many times prohibited!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Performance of branch schemes

A

Pipeline speedup = Pipeline depth / (1 + Branch frequency x Branch penalty)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Branch prediction

A
  • static - branch predictions are known in compile time

- dynamic - based on program behaviour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Static branch prediction

A
  • based on the fact that branch prediction is bimodally distributed either in favour of branch taken or branch untaken
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dynamic branch predictions

A

simplest version - branch-prediction buffer, or branch history table

- simple buffer of lower bits of addresses with bit that indicates if the branch there was taken
 - could be modified also by different branch with the same lower bits of address
  - if branch was not taken, then bit is reversed to 0
  - improvement is using 2 bit prediction schemes - prediction must miss twice before it changes - 11 taken, 10 taken, 00 untaken, 01 untaken
   - also more bits can be used, but practice shows that 2 bits are enough
How well did you know this?
1
Not at all
2
3
4
5
Perfectly