5.0 Branch prediction Flashcards
What stage is branch prediction a part of?
Instruction fetch.
This is to allow the stage to fetch as many, and as useful, instructions as possible at the same time.
What two things does branch prediction do?
Predict if branch is taken, and the target address
How does pipeline depth affect cost of mispredicted branches?
Cost is proportional to pipeline depth
What are the two ways of doing branch direction prediction
Statically: Before program execution, done by compiler/programmer inserting a hint
Dynamically: during program execution, use global and local history to make predictions, done in hardware
What are some advantages with static branch prediction?
Easy to implement, little hardware needed.
What are a disadvantage with static branch prediction
Prediction is the same regardless of dynamic behaviour
What are the three types of static prediction?
Rule, program, profile-based
Define rule-based static prediction
Implement a rule that is always used, in the pipeline
Rules:
- Not taken (sequential fetch)
- Always taken (requires compute branch target, a little more complex, branch target known at decode stage)
- Back taken forward not taken (combine these two, loops are typically taken)
Define program-based prediction
Typically more accurat then rule-based
requires a hint in opcode, specifying behaviour for different branch-types:
- If branch is loop - take
- if compare pointer to NULL - not take
- if compare pointers - take branch to inequality
What is profile based prediction?
When you first compile and execute a binary with a certain set of training input data.
This run collects profile data that defines how branching is taken statistically
Use this to predict branches. over 50% - taken, under - not taken
What context can dynamic branch prediction use to predict branches?
Local history: The branch’s own history
Global history: Other branches’ history
what is a bimodal predictor
The hash of the PC is used as a pointer into a buffer holding a counter that defines branch outcomes
Using a two bit predictor, predict taken if 11 or 10, and not taken if 01 or 00. if branch was actually taken +1, if not taken -1.
What are the components of 2-level predictors?
How does a global 2-level predictor work?
Uses two levels of branch history to make predictions (bimodal only uses the branch address)
Two level use local history or global history
History: Outcome of prior branch executions
History acts as shift register. Newest outcome is inserted, oldest is discarded
1st level for a global predictor is a Branch History Register containing h bits and the global history. h bits shows branch outcome of last h branches
Then take the hash of the branch address and concatenate with the global history and use this as index to a pattern history table containing counters.
Why does a two-level global predictor work?
Because there is correlation between branches.
If a branch is taken - it is probable that the next also will be taken
What is a two level predictor using local history?
Have a 1st level branch history table containing local history.
Hash the PC and use this as index to the BHT.
Concat the PC hash with the entry from BHT and index into pattern history table that contains counter.
Why does a two level local predictor work?
Branches have local behaviour (alternating, loops)
How does the number of bits used in the PC hash affect the branch predicor implementation-type?
m: number of bits retrieved from the PC, used to index pattern history table
if m is 0: all branches share the same pattern history table - this would be a global history table
if m is not 0: per-address pattern history table
What is a gshare predictor
Instead of concatinating the h bits from the global history with the m bits from the branch address, these are XORed
What are some causes of mispredictions when using dynamic prediction?
Some branches are just hard to predict
Can be caused by interference: predictor is to small for branches - branches with different behaviour may be mapped onto the same PHT entry
Branch behaviour doesn’t match branch predictor type. Global predictor, but branch doesn’t use global state
What is hybrid branch prediction?
combine different type of branch predictors, and learn which ones are most accurate.
What is a tournament predictor?
PC hash is used to index into the metapredictor table. This table contains a counter that records the behaviour of the predictors used. The metapredictor entry is used to choose what predictor result to use. If entry < 2 P1 is used, entry >= 2 P2 is used.
The metapredictor counter is updated on every prediction
- if both predictors correct/incorrect - do nothing
- if P1 correct - decrement
- if P2 correct - increment
Entries in P1 and P2 are updated every prediction, regardless of which one is used
What is a BTB (Branch target buffer) or Branch Target Address Cache (BTAC)
Input: branch address (PC)
Output: Target address
Accessed in the same cycle as we access the instruction cache
If an instruction is a branch, we use the target address as PC in next cycle
Used in IF stage. If target is cached, we are able to continue fetching without loosing cycles
Target is used to speculatively fetch in next cycle
What is a RAS (return address stack)
Have a circular buffer where you push the return address (PC + 4) when doing a function call. Pop address upon return.
It is easy to predict function calls, not return address. This is because multiple addresses can call a function - and therefore be a possible return address
If depth of calls exceeds the RAS-size, we will have problems
What is speculative execution?
Predict branch target address and start execution along predicted path.
there might be multiple branches in flight along the predicted path - to keep track of this, tags are added to the instructions along the given paths.
What happens when a branch is correctly predicted?
Tags are deallocated
instructions become non-speculative
What happens when a branch is mispredicted?
Wrong path instructions are nullified
Start fetching instructions along correct path
What is the structure of the branch target buffer (BTB)
PC is used as input(tag) to table
If we hit, take the target address and use it speculatively as the next PC to fetch in next cycle
How does a BTB/RAS access work in hardware?
On function call, take branch-address + 4 and store into RAS.
In parallell, you access the BTB with the branch address to get the target address.
I the branch is a return, instead of taking the result from the BTB, pop the top most element of the RAS, and use this as the target address.
How does return address prediction work in the pipeline?
Fetch stage:
- Use branch address (comes from PC) to access BTB/RAS
- If you hit in the BTB use this as target address, if not, continue fetching continually (PC + 4)
- (BTB hit is used as control signal in mux deciding between target address and PC +4)
Decode stage:
- Perform branch prediction - gives taken/not-taken
- Perform address calculation - gives the actual branch target
- If the branch is taken, and the predicted target address from the fetch stage is wrong - use the calculated address (causes stall for a cycle)
If BTB/RAS address != calculated address - causes stall and will need to insert a bubble
What is speculative branch execution?
Predict target address and start fetching and executing along this path.
There might be multiple branches in flight during this.
To keep track of each branch, a tag is added to each speculatively taken execution path
What happens when a branch evaluates to correctly predicted
Tags are deallocated
Instructions are set as non-speculative
What happens when a branch evaluates to mispredicted
Instructions along the wrong path are nullified.
Start fetching along the correct path.