Instruction Scheduling Flashcards
What do we use to take care of control dependencies?
Branch prediction
What do we use to take care of WAR and WAW (false) data dependencies?
Register renaming
What do we use to take care of RAW data dependencies?
Out of order execution
What do we use to take care of structural (hardware) dependencies?
Invest in wider issue (the ability to execute more instructions simultaneously)
How do we actually implement register renaming and out of order execution in hardware?
Tomasulo’s algorithm
What is Tomasulo’s algorithm for?
1) Determining which instructions have their inputs ready
2) Register renaming
Big picture, how does Tomasulo’s algorithm work?
We have our instructions in a queue. Instructions are issued (in order?) to their respective ALU units (MUL, ADD, etc).
Each ALU unit has a Reservation Station that holds instructions until all of their inputs are ready. Any inputs that are already stored in registers will be filled in at this point.
Instructions in the RS that have all ready inputs will be dispatched for execution. The outputs of these instructions are broadcast to both the registers and all of the Reservation Stations.
Loads and Stores go to a separate Address Generation Unit and from there to either a Load or Store Buffer. Completed Load instructions will broadcast as well.
What happens during an Issue?
1) Take next instruction from queue. This has to happen in program order for register renaming to work.
2) Determine where the inputs will come from (are they already in the register file or are they going to be outputs from other instructions). This requires a RAT table.
3) Get a free space at the correct Reservation Station. If there is no free space, you’ll have to wait a cycle.
4) Put instruction in the Reservation Station
5) Tag the destination register so the result will go there when it’s produced, and so future instructions will know where to find this value.
What is the Register Alias Table (RAT)?
The RAT has an entry for each register and keeps track of which instruction is going to produce that result. Before any issues, each entry points to its corresponding entry in the Registry File. Once we get going, RAT entries point to RS entries.
What is the Register File?
The Register File tracks the values in each register.
How does Dispatch work?
When a result is broadcast, we check its tag again those in the Reservation Stations. If they match, we will fill in those values. After this, any instructions with fully ready inputs are ready to execute. We now dispatch as many as the hardware will allow (maybe that’s one ADD and one MUL)
When more than one instruction is ready to Dispatch to the same ALU, how do we choose which to send first?
Our options are…
1) Oldest First: A compromise between the other two. Manageable, but also better performance than Random.
2) Most dependencies first: Most efficient, but hard! Requires searching.
3) Random: Easiest to implement, but worst performance.
Oldest first is a good compromise.
What happens when we write a result/broadcast?
1) Put tag and result in the “bus” for broadcasting
2) Write our result to the appropriate Register File (RF) entry
3) Update the RAT to point back to the appropriate RF entry
4) Free the Reservation Station (RS)
(When do we use the broadcast values to fill in RS inputs? Is that what the bus is for?)
What do we do when more than one instruction is ready to be broadcast?
We usually give priority to the result from the slower ALU (like MUL over ADD), on the assumption that the operations waiting for its result have been waiting longer.
What are the memory dependencies for Load and Store instructions?
RAW: Store to A then Load from A
WAR: Load from A then Store to A
WAW: Store to A then Store to A