8.0 Micro architectural design options Flashcards
What is scoreboarding?
Mostly used in combination with a super-scalar in-order pipeline.
Assume you don’t use register renaming (have WAW, and WAR hazards)
Provides precise exceptions
How is scoreboarding implemented?
Extend pipeline with splitting the issue stage in two:
- issue
- operand fetch
Add a scoreboard: a table that keeps track of each instructions that are in flight.
Scoreboarding is a centralized technique for hazard detection.
Describe the issue stage of score boarding
Check if the functional unit where the instruction can run is available.
Check that no active instructions have the same destination register.
This takes care of WAW- and structural hazards
Describe the operand fetch stage of scoreboarding
An instruction can fetch its operands when they’re available.
Operands are available when no in flight instructions are in the process of writing to them.
Takes care of RAW hazards
Describe the execution stage of scoreboarding
The functional unit executes the instruction.
When complete, the scoreboard is notified.
No registers are read/written to - no new hazards in this stage
Describe the writeback stage of scoreboarding
Need to delay writing if there is an preceeding instruction, in program order, that has not fetched its operands yet. And this register is the one to be written to. Takes care of WAR hazards.
This may cause out-of-order completion. An younger instruction that goes to a shorter latency ALU unit, may complete before an older instruction that is in a longer latency unit. If the older instruction then has an exception, we don’t get presise exceptions.
To ensure presise exceptions, we require the instructions to write results in program order.
The results are buffered in the pipeline registers until they can be written. Because of this, we need to issue instructions in order within a functional unit to avoid deadlocks.
Can still do out-of-order execution across the different functional units, but not within
What is the difference between scoreboarding and data flow execution with Tomasulo algorithm?
Scoreboarding is a sentralized scheme, where Tomasulo is distributed. When the work is distributed across the reservation stations, it will preceed without any more coordination
Tomasulo uses register renaming (hazard handling in frontend), while scoreboarding takes care of the hazards in the issue and writeback stages.
Scoreboarding executes instructions in order within the functional unit, where Tomasulo executes instructions when they are ready meaning they can be out-of-order.
Tomasulo dispatches instructions in order (inserts into ROB and reservation stations at the same time), scoreboarding issues instructions to different functional units out of order - when the dependences are met and the units are available.
Describe the behaviour of in-order execution.
Name a limitation and advantage
Instructions are executed in-order, where the order is determined by the software(programmer).
A limitation is that one instruction stall, blocks the entire pipeline.
Offload the ILP analysis to the compiler. Saves complexity.
ILP (instruction-level-parallelism) in hardware is limited.
Less complex hardware. Less complexity, but higher clock speed but (in this case a good thing).
Lower power consumption because of less hardware complexity.
These types of processors can for example be useful in embedded systems.
What are the different orders of instructions we find in a pipeline?
Issue order
dispatch order
execution order
completion order (commit or writeback)
What affect how much you can deviate from program order?
The microarchitectural techniques implemented in the given processor.
As long as hazards are respected - anything goes.
What are some techniques that can be used when designing a processor?
Dataflow execution (Tomasulo)
Scoreboarding
Prediction and speculation
Caching
Register renaming
Depending of how these techniques are combined, they will achieve different design points (area, performance, power, cost)
Describe an architecure where register renaming is combined with scoreboarding
WAW and WAR hazards are handled automatically.
Instruction scheduling is simplified (complexity) at the expense of more physical area taken and register management overhead (complexity).
Issue and operand fetch stages are merges, because we don’t need to seperate these anymore, as this seperation handled one of the hazards. Register rename is before issue/operand.
Issue stage:
- Issue when operands are available
- issue when functional unit is available
- handles RAW and structural hazards
Writeback stage:
- Don’t need to stall the functional unit
- write the result back to the rename register
- The rename register is promoted to architectural register when the instruction is the oldest in-flight instruction
Execution stages remains the same
What is a dependency chain?
A sequence of instructions that depend on each other.
What is a common pattern of dependency chains?
- instruction compute the address of a load (Address Generating Instruction)
-Data is loaded from memory - computation is performed
- result is written back to memory
Within loops, step 1 and 2 can be overlapped with 3 and 4 of later iterations.
What is AGI
Address generating instruction (i.e. load)