Lectures 3 and 4 - Pipelining Flashcards

0
Q

Hazard Classes

A

Structural Hazards - arise from resource conflicts

Data Hazards - Arise from inter-instruction dependencies

Control Hazards - Cause by instructions that change the program counter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Pipeline hazards

A

Situations that arise that force the pipeline to stall.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data dependencies

A

Real - If j uses the result of i then it is data dependent on i. They are also dependent if there is a chain of dependencies.

Name - simply a result of register reuse with no data actually communicated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Hazard

A

Created whenever the parallel execution of instructions makes it possible for a dependency to be violated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Methods for Avoiding Data Hazards

A

Add hardware to avoid hazard such as data forwarding or bypass paths

Transform or schedule the code in order to prevent a data dependence becoming a data hazard, this could be done by inserting independent instructions to separate dependent instructions that would cause a stall. It can be performed by the compiler or by hardware.

Hazards arising from name dependencies can be avoided by renaming either in compiler or hardware.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Load-use Data Hazard

A

ALU operation must wait for result from load, it cannot execute in same lock cycle even with forwarding.

Cukd insert independent instructions or do nothing instructions in load-delay slot.

Alternatively hardware could detect hazard and stall pipeline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Control Hazards

A

Branch instructions may or may not change the value of the PC so how do we know what to do after we fetch a branch. Simplest approach is to just stall until outcome of branch is known.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Precise Exceptions

A

A processor is said to support precise exceptions if it can guarantee all instructions prior to the faulting instruction have executed and all those prior to it have not begun to execute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Method of handling exceptions

A

Tag instruction if it causes an exception.

Deal with exception when instruction reaches the write back stage.

Faulting instructions are prevented from updating state. They’re handled at WB stage to ensure they’re handle in order. We save the program counter of the faulting instruction and use it to restart the program after the exception handler had completed.
Flush remaining instructions from pipeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Alternative approaches to exception handling

A

Settle for imprecise exceptions.

Use special hardware for handling exceptions which doesn’t require flushing but just stalling while dedicated hardware deals with exception.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Limits of pipelining

A

Difficult to achieve linear performance gains with deeper pipelines because:

  • pipelining registers introduce overheads and it is difficult to provide low-skew global clock within a reasonable power budget
  • adding stages may increase complexity
  • must balance logic in each stage
  • number of stalls due to pipeline hazards grow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Optimal Pipeline Depth

A

Throughout = 1/(1+(s-1)b) x 1/(t/s+c)

Then can differentiate to get ideal pipeline length ends up being approximately 23 stages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Condition codes or flags

A

Branches test special flags set by ALU operations. These usually include zero, carry, negative and overflow.

CCs may constrain instruction ordering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Condition Registers

A

Perform comparison and place condition codes in general purpose registers. This permits a simple implementation but costs the use of a register.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Compare and branch

A

Performs compare and branch in single instruction. This however potentially complicates pipelining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Predict-not-taken scheme

A

Assume branch won’t be taken and continue fetching instructions. This means the penalty for intakes branches is now zero.

16
Q

Evaluate branch earlier in pipeline.

A

Move branch test to decode stage.

Add an additional adder to calculate new pc in decode stage.

The branch penalty is now reduced to 1 cycle.

This requires a simple to evaluate branch condition. The tested condition must be ready during instruction decode. A data hazard is introduced if an instruction that sets the condition is followed by a branch.

17
Q

Delayed Branch

A

Assume instruction after branch is always executed regardless of the branch condition. This slot is then filled with independent instruction by the compiler if one exists otherwise with no-op. The compiler can fill a single branch delay slot around 60-70% of the time.

18
Q

Static branch prediction

A

Exploits observation that a given branch instruction is likely to be highly biased in one direction. Displacement-based prediction - predict branch taken if target address < program counter. Alternatively have compiler set prediction bit in instruction. If prediction incorrect then cancel branch. It can be improved using information collected during a profiling run.

19
Q

Bi-modal prediction

A

1-bit:

Store branch outcome
Use state as prediction next time is taken.

2-bit:

Use 2-bit saturating counter which prevents prediction being flipped in case of single misprediction.

20
Q

Local History Predictor

A

If branch behaviour has a repetitive pattern in its local history this can be used to improve prediction accuracy.

21
Q

Global correlating predictor

A

Behaviour of branch is often correlated with behaviour of other recent branches so can access a different counter dispensing on behaviour of recent branches. It’s is because an individual branch can be reached along different paths so maintain unique prediction for particular path. History of recent branches is stored in k-bit register.

22
Q

Tournament predictor

A

For particular branches global or local predictors will be superior so implement both and see which performs best on a per branch basis. This requires local predictor, global predictor and 2-bit saturating counter per branch to decide between them.

23
Q

Limits to ranch prediction

A

Some branches are unpredictable and will appear random.

Time is needed to train predictor before correct predictions can be made

Limited Size - history and number of predictors

Aliasing/interference -

Negative aliasing - branches map to same entry in table and move counter in opposite directions. There’s also positive/neutral aliasing.

24
Q

Determining branch target

A

To avoid stalling on branch we need to known the next address to be fetched by end of IF stage. We use a small cache indexed by the PC of recent branches and their most recent target addresses.

25
Q

Return Address Predictors

A

Procedures and functions may be called from multiple places in th program. Accuracy of branch target stored in BTB may be very low. This can be solve by using a small hardware stack to store these addresses.

26
Q

Avoiding branches

A

Loop count register - decrement and branch instruction

Predicated execution - execution of all instructions is conditional this transforms control dependency into data dependency but costs opcode bits and nullified instructions in pipeline.

Conditional moves - move if flag/nominated reg is set.