Branches Flashcards
What are the requirements for branch prediction?
Works using only PC of current instruction.
Must correctly guess whether a branch is taken.
If taken, what is the target PC?
Equation for branch prediction accuracy
CPI = 1 (or ideal CPI) + (# mispreds/# total insts) * (penalty per mispred)
What percent of instructions are branches in the average program?
20%
Does it ever make sense to NOT use branch prediction?
No! If you don’t use branch prediction, you always have to wait until the instruction is decoded to move forward, so in the canonical 5 stage pipeline you lose at least two cycles. If it’s a branch, you lose 3 cycles waiting to find out which way the branch goes.
With the simple not-taken predictor, you only waste cycles when there’s a mispredicted branch.
Why do we need predictors that are better than predict-not-taken?
Because the difference because large for deep pipelines or when we can do multiple instructions per cycle. i.e., it lets us maximize performance on more sophisticated hardware.
Branch Target Buffer (BTB)
The simplest history predictor. Indexes each instruction PC into a table where we stored the PC for the actual next instruction the last time this instruction was run. Keep the table small so it works fast, we only need the instructions that are coming soon anyway.
How do you map instructions PCs to the their entry in the BTB?
Using the least significant bits!
Direction predictor
We index into a Branch History Table using the least significant digits, just like the BTB. The BHT has a 0 if the branch was not taken last time and 1 if it was. If it’s a branch, we go to the BTB to get the target. Once the branch resolves, we update the table entry. If taken, we also update the BTB.
The BHT can be large because each entry is 1 bit!
When do 1-bit predictors work/not work?
They work when the number of not taken branches vastly outweighs the number taken or vice versa.
They don’t work when the number taken/not taken are closer to each other, or in short loops.
Note that each anomaly causes 2 mispredictions, not one.
Explain the 2-bit predictor
It’s a counter that counts up when a branch is taken and counts down when it’s not.The advantage is that single anomalies only cost one misprediction.
It has 2 bits, a predictor bit and conviction bit. From Strong Not Taken to Strong Taken it looks like
00
01
10
11
Why not use more bits for a predictor/counter (3-bit, 4-bit, etc)?
First, cost. Second, these predictors are only really helpful when anomalies come in streaks, but empirically, they don’t!
1-bit history with two 2-bit counters
This allows us to learn a simple pattern like (T, N, T, N). The 1-bit history just tells us whether the last branch was taken or not, and it indexes into the two counters (one for each of two possible states of the 1-bit history).
It looks like this: (0, 01, 11). That would mean the last branch was not taken and we predict the current branch is not taken because of the 01 (weak not taken) value.
If the 1-bit history had a value of one, we would index into the 11 (strong taken).
2-bit history predictor
It looks like the 1-bit history with two 2-bit counters, just bigger! We have something like (2-bit history, counter for history of 00, counter for 01, 10, and 11). Five total 2-bit entries, so 10 bits per branch.
This allows us to learn a pattern of three like (N, N, T).
What patterns can an n-bit history predict correctly?
All patterns of length <= n+1
What is the cost/size of an n-bit history predictor?
n + 2*2^n bits