Pipeline Hazards and Branch Prediction Flashcards
DEFINE: Pipeline Hazard.
A consideration that we need to take into account with pipelining, preventing data or processing mismatches.
There are four read/write hazards to consider. In-order processors only need to deal with which one?
Read-after-Write hazards.
DEFINE: Read-after-write (RaW) hazards.
When data is accessed after a change is made to the register but before that change is applied in the Writeback step.
TRUE/FALSE: Stalling is the only major way to resolve read-after-write hazards.
FALSE. You can also use Forwarding/Bypassing.
DEFINE: Stalling.
Waiting for the previous cycle to complete before continuing, usually because of dependencies. Creates pipeline bubbles.
In an in-order processor that implements Stalling to resolve read-after-write hazards, how are potential hazards detected?
In the Decode stage, source registers in the upcoming instructions are checked against destination registers further in the pipeline. If a dependency is detected, the new instruction is stalled.
DEFINE: Forwarding/Bypassing.
Intercepting the new value of a destination register after the Execute stage to immediately pull back to the Register file, so as to prevent read-after-write stalling.
If forwarding adds to the Decode stage’s critical path, why is it often implemented anyway?
Performance benchmarking.
What’s the tradeoff for adding forwarding/bypassing paths back to the register file?
More multiplexers, and in turn more chip space.
DEFINE: Load-Use Data Hazard.
When a value being loaded from memory into a register is used before the Writeback stage is completed.
TRUE/FALSE: An in-order processor’s only solution to a Load-Use Data Hazard is through stalling until the value is written back.
TRUE. The compiler or software developer can often change their code to avoid this, however.
What hazards are contained in the following list of instructions? These are ran in-order, with all registers initially filled with valid values:
- ADD s0, s1, s2
- SUB s3, s0, s4
- LW s5, 0(s3)
- ADD s6, s5, s7
- ADD s8, s6, s9
There are 3 read-after-write hazards and 1 load-use hazard:
- Instruction 2 tries to read s0 immediately after it’s written (but not yet written back). RaW.
- Instruction 3 uses s3’s value before it’s written back to load a memory address. RaW.
- Instruction 4 uses s5’s value before the memory has loaded it and written back. LU
- Instruction 5 uses s6’s values before Writeback. RaW.
DEFINE: Control Hazard.
Conditional branching sending us down a set of instructions, but potentially the wrong one thanks to the conditions.
What’s the main way we deal with Control Hazards?
Branch Prediction
TRUE/FALSE: The best way to handle branch prediction is to stall.
FALSE, as shown through benchmarking.
TRUE/FALSE: When a branch mis-prediction happens, we simply need to cancel all currently pipelined instructions associated with that branch.
FALSE, we also need to undo any work done by those future instructions.
How could we use “epochs” to handle branch mispredictions?
We assign each instruction an incrementing epoch, then when a mis-predict occurs, ignore any greater than that mis-predict.
If branch mis-predictions are rare, why do we still care about them?
The miss penalty is disproportionately large.
DEFINE: Static branch prediction.
Branch prediction based only on the current situation. An example is a not-taken policy, where the assumption will always be that the branch is not taken, or BTFNT (Backwards-Taken, Forward Not Taken), where backwards branches (such as in loops) are taken and forward branches aren’t.
DEFINE: Dynamic branch prediction.
Branch prediction based on the history of that instruction.
After which pipeline stage does a mis-prediction cause a bubble?
The Execute stage.
When considering branch prediction, what 2 questions must we answer?
1) Will this instruction cause a branch?
2) Where will that branch lead?
What entity is commonly used to decide whether a branch was taken?
The Branch History Table (BHT).
What entity is commonly used to decide where a branch leads to?
The Branch Target Buffer (BTB).
What is missing in the blanks for the following pseudocode, describing the base Branch Predictor logic.
1) Word next_pc = pc + 4
2) Bit[10] lsb = truncate(pc)
3) if (___[lsb]):
4) next_pc = ___(lsb)
5) return next_pc
- bht or Branch History Table
- btb or Branch Target Buffer
TRUE/FALSE: The optimal branch predictor without time travel can’t ever be perfect, but can be very accurate.
TRUE.
I run the following function in C:
const char* getCountry(int cc) {
if (cc == 0) return “A0”;
if (cc == 1) return “A1”;
if (cc == 2) return “A2”;
…
if (cc == 256) return “A256”
}
It’s running very slowly, but when I remove the first line, it improves dramatically. Why might that be?
The branch prediction tables (entities) likely can’t store more than 256 instructions. As such, mispredictions are very common.
Describe a 1-bit predictor.
A branch predictor that only stores the latest result from an instruction. For example, if that instruction branched last time, it predicts it will branch this time as well, with no further complexity.
Describe a programming concept that works very well with 1-bit predictors.
Loops. They are very likely to loop back.
Describe a 2-bit predictor.
A branch predictor that stores two bits and therefore four possible values for prediction, with every result ticking that value up or down by 1.
What concept can we use to avoid branch prediction altogether?
Branchless programming.
DEFINE: Loop Unrolling.
For static loops (for i = 0 to 15 do this), simply writing each line of i to avoid loop checking in branch prediction.
What two coding situations will prevent loop unrolling from making significant improvements?
Dynamic loops (for i = 0 to n-1 do this) and complex logic in the logic within the loop.