Difficult to achieve linear performance gains with deeper pipelines because: pipelining registers introduce overheads and it is difficult to provide low-skew global clock within a reasonable power budget adding stages may increase complexity must balance logic in each stage number of stalls due to pipeline hazards grow

Lectures 3 and 4 - Pipelining Flashcards by Joel Moss

Hazard Classes

Structural Hazards - arise from resource conflicts

Data Hazards - Arise from inter-instruction dependencies

Control Hazards - Cause by instructions that change the program counter

How well did you know this?

Not at all

Perfectly

Pipeline hazards

Situations that arise that force the pipeline to stall.

How well did you know this?

Not at all

Perfectly

Data dependencies

Real - If j uses the result of i then it is data dependent on i. They are also dependent if there is a chain of dependencies.

Name - simply a result of register reuse with no data actually communicated.

How well did you know this?

Not at all

Perfectly

Data Hazard

Created whenever the parallel execution of instructions makes it possible for a dependency to be violated.

How well did you know this?

Not at all

Perfectly

Methods for Avoiding Data Hazards

Add hardware to avoid hazard such as data forwarding or bypass paths

Transform or schedule the code in order to prevent a data dependence becoming a data hazard, this could be done by inserting independent instructions to separate dependent instructions that would cause a stall. It can be performed by the compiler or by hardware.

Hazards arising from name dependencies can be avoided by renaming either in compiler or hardware.

How well did you know this?

Not at all

Perfectly

Load-use Data Hazard

ALU operation must wait for result from load, it cannot execute in same lock cycle even with forwarding.

Cukd insert independent instructions or do nothing instructions in load-delay slot.

Alternatively hardware could detect hazard and stall pipeline

How well did you know this?

Not at all

Perfectly

Control Hazards

Branch instructions may or may not change the value of the PC so how do we know what to do after we fetch a branch. Simplest approach is to just stall until outcome of branch is known.

How well did you know this?

Not at all

Perfectly

Precise Exceptions

A processor is said to support precise exceptions if it can guarantee all instructions prior to the faulting instruction have executed and all those prior to it have not begun to execute.

How well did you know this?

Not at all

Perfectly

Method of handling exceptions

Tag instruction if it causes an exception.

Deal with exception when instruction reaches the write back stage.

Faulting instructions are prevented from updating state. They’re handled at WB stage to ensure they’re handle in order. We save the program counter of the faulting instruction and use it to restart the program after the exception handler had completed.
Flush remaining instructions from pipeline.

How well did you know this?

Not at all

Perfectly

Alternative approaches to exception handling

Settle for imprecise exceptions.

Use special hardware for handling exceptions which doesn’t require flushing but just stalling while dedicated hardware deals with exception.

How well did you know this?

Not at all

Perfectly

Limits of pipelining

Difficult to achieve linear performance gains with deeper pipelines because:

pipelining registers introduce overheads and it is difficult to provide low-skew global clock within a reasonable power budget
adding stages may increase complexity
must balance logic in each stage
number of stalls due to pipeline hazards grow

How well did you know this?

Not at all

Perfectly

Optimal Pipeline Depth

Throughout = 1/(1+(s-1)b) x 1/(t/s+c)

Then can differentiate to get ideal pipeline length ends up being approximately 23 stages.

How well did you know this?

Not at all

Perfectly

Condition codes or flags

Branches test special flags set by ALU operations. These usually include zero, carry, negative and overflow.

CCs may constrain instruction ordering.

How well did you know this?

Not at all

Perfectly

Condition Registers

Perform comparison and place condition codes in general purpose registers. This permits a simple implementation but costs the use of a register.

How well did you know this?

Not at all

Perfectly

Compare and branch

Performs compare and branch in single instruction. This however potentially complicates pipelining.

How well did you know this?

Not at all

Perfectly

Predict-not-taken scheme

Study These Flashcards

Assume branch won’t be taken and continue fetching instructions. This means the penalty for intakes branches is now zero.

Evaluate branch earlier in pipeline.

Study These Flashcards

Move branch test to decode stage.

Add an additional adder to calculate new pc in decode stage.

The branch penalty is now reduced to 1 cycle.

This requires a simple to evaluate branch condition. The tested condition must be ready during instruction decode. A data hazard is introduced if an instruction that sets the condition is followed by a branch.

Delayed Branch

Study These Flashcards

Assume instruction after branch is always executed regardless of the branch condition. This slot is then filled with independent instruction by the compiler if one exists otherwise with no-op. The compiler can fill a single branch delay slot around 60-70% of the time.

Static branch prediction

Study These Flashcards

Exploits observation that a given branch instruction is likely to be highly biased in one direction. Displacement-based prediction - predict branch taken if target address < program counter. Alternatively have compiler set prediction bit in instruction. If prediction incorrect then cancel branch. It can be improved using information collected during a profiling run.

Bi-modal prediction

Study These Flashcards

1-bit:

Store branch outcome
Use state as prediction next time is taken.

2-bit:

Use 2-bit saturating counter which prevents prediction being flipped in case of single misprediction.

Local History Predictor

Study These Flashcards

If branch behaviour has a repetitive pattern in its local history this can be used to improve prediction accuracy.

Global correlating predictor

Study These Flashcards

Behaviour of branch is often correlated with behaviour of other recent branches so can access a different counter dispensing on behaviour of recent branches. It’s is because an individual branch can be reached along different paths so maintain unique prediction for particular path. History of recent branches is stored in k-bit register.

Tournament predictor

Study These Flashcards

For particular branches global or local predictors will be superior so implement both and see which performs best on a per branch basis. This requires local predictor, global predictor and 2-bit saturating counter per branch to decide between them.

Limits to ranch prediction

Study These Flashcards

Some branches are unpredictable and will appear random.

Time is needed to train predictor before correct predictions can be made

Limited Size - history and number of predictors

Aliasing/interference -

Negative aliasing - branches map to same entry in table and move counter in opposite directions. There’s also positive/neutral aliasing.

Determining branch target

To avoid stalling on branch we need to known the next address to be fetched by end of IF stage. We use a small cache indexed by the PC of recent branches and their most recent target addresses.

Return Address Predictors

Procedures and functions may be called from multiple places in th program. Accuracy of branch target stored in BTB may be very low. This can be solve by using a small hardware stack to store these addresses.

Avoiding branches

Loop count register - decrement and branch instruction Predicated execution - execution of all instructions is conditional this transforms control dependency into data dependency but costs opcode bits and nullified instructions in pipeline. Conditional moves - move if flag/nominated reg is set.

Lectures 3 and 4 - Pipelining Flashcards

(27 cards)