Pipelining Flashcards
What is the canonical processor pipeline?
Fetch Read/Decode ALU Memory Write
Fetch: Use program counter (PC) to get instruction from instruction memory
Read/Decode: Read registers and decode instruction
ALU: Perform arithmetic and logical operations on info read from registers, using logic decoded from instruction
Memory: If instruction was a load or store, ALU will return address and we’ll load/store from that address in data memory
Write: Write result to register
Why would CPI be greater than 1 in a pipeline?
- Initial filling of the pipeline, though this becomes negligible as # of instructions grows large
- Pipeline stalls! (the real answer)
Why might there be a stall in the pipeline?
- Branches/jumps: e.g., when you have a jump instruction, you can’t load the correct subsequent instructions until after the jump has been decoded and processed by the ALU. So when jump gets to the ALU, you have to flush all instructions earlier in the pipeline.
- Data dependencies: An early stage in the pipeline might need the result of an instruction later in the pipeline to execute correctly, like the Decode stage waiting on the ALU to load something from memory to a particular register.
What is a control dependence?
When execution of instructions depends on the behavior of a branch.
What two factors determine how control dependence will affect CPI in a pipeline?
- Branch misprediction penalty: the # of pipeline stages that come before the stage where a branch is decided
- Branch prediction: If you can accurately predict branches, you can avoid the penalty!
The deeper the pipeline, the better your branch prediction needs to be!
Types of data dependencies
RAW (read after write): When an instruction needs to read from a register that a previous instruction is writing to, it can’t move forward until the previous one completes. Also called a “flow” or “true” dependence.
WAW (write after write): When an instruction needs to write to a register that previous instruction is writing to (if they go out of order, the wrong item will be in the register). Also called an “output” dependence.
WAR (write after read): When an instruction needs to write to a register after another reads from it. Order matters so that the earlier instruction reads the correct data. Also called “anti” dependence because it reverses the order of RAW.
Which data dependencies are “true” and which “false”?
True: RAW (also called “flow” or “true” dependence)
False: WAW and WAR
Another name for “false” dependence
Name dependence
Is RAR (read after read) a dependence?
Nope! It doesn’t matter what order instructions read in as long as no one is writing to that register
Dependencies vs Hazards
Dependencies are a property of the program alone. Not all dependencies are hazards (just RAW I guess?)
Hazards are a property of the program and pipeline! If an instruction has a data dependency on an earlier instruction, but the earlier one leaves the pipeline before it reads the register, there’s no hazard! Also, false dependencies are never(?) hazards.
How can we handle hazards?
- Flush dependent instructions. For branches, this is the only option.
- Stall dependent instructions. This works for data dependencies.
- Fix values read by dependent instructions. This works for SOME data dependencies. As long as the correct value exists somewhere in the pipeline, we can “forward” it to where it’s needed. It doesn’t need to be written to register first.
How does the number of stages affect performance?
As number of stages increases, the number of hazards increase (and so does misprediction penalty), increasing CPI.
But the work needed per stage decreases, decreases clock time per cycle.
How to choose number of stages to optimize performance?
Best performance is achieved by adding more stages so that we decrease work needed per stage (clock time), without adding so many stages that we significantly increase hazards. The graph of execution time vs # stages is an upward facing parabola.
In modern processors, the minima is around 30-40 stages.
Should we optimize pipelines strictly for performance?
No! Consider power consumption too. Power consumption increases linearly with number of stages, so we choose the point where this intersects the performance optimization parabola.
On modern processors, this is around 10-15 stages.