Superscalar and Out-of-Order Processors Flashcards
In a processor without superscalar, what is the optimal effective CPI?
1.
When we measure for AMAT in processors, what metric are we generally using as a unit?
CPI or IPC.
DEFINE: An n-way Superscalar Processor.
A processor that can run multiple pipelines to parallelize processing, so as to increase throughput.
What prevents us from achieving theoretically optimal CPI in a superscalar system?
The natural portion of code that is inherently unparallelizable. I.E. Amdahl’s Law.
In an n-way superscalar processor, what is the theoretically optimal CPI?
1/n CPI.
TRUE/FALSE: Superscalar processing systems are an example of manycores in action.
FALSE. Superscalar works within a single core.
What are the two major concerns with in-order superscalar processors?
1) We can only stall for many data hazards, as forwarding would break the dependency.
2) We need increased complexity and chip resource allocation to handle it.
DEFINE: VLIW.
Very Long Instruction Word. An ISA that utilizes long instruction space to process multiple instructions at once.
What’s historically been the problems with VLIW?
It makes the compiler’s job very hard due to required parallelization logic, isn’t backwards compatible, and has very complex hazard management. Also, Amdahl’s Law.
I write the following code:
int foo = 0;
for {int i = 0; i < 256; ++i) {
foo = 2 * foo + 1;
}
return foo;
I have a 256-way superscalar out-of-order processor, and I have a magical branch predictor that always guesses correctly. I optimize my code, allowing for loop unrolling, but it still doesn’t run well. Why is this?
Data hazards still exist within each iteration, so the parallelization creates a ton of stalling.
DEFINE: Out-of-order processor.
A processor that utilizes scheduling to process instructions out-of-order, while monitoring and minimizing hazards.
Why do superscalar processors tie-in well with out-of-order processors?
Superscalars are looking for groups of n instructions each to run simultaneously; Out-of-order processors allow them to look for upcoming instructions to maximize throughput in this regard.
Which pipeline stage is the Scheduler added to in out-of-order processors?
The Decode stage.
What kind of data structure represents dependencies in a set of instructions?
The Instruction Dependence Graph.
Why is out-of-order processing more appealing than VLIW in the software space?
It’s transparent in its parallelization. The compiler for VLIW has to manage dependencies, the microarchitecture for OoO manages them instead.
What’s the most common type of hazard for out-of-order processors to manage?
Read-after-write hazards.
What techniques do we have to deal with read-after-write dependencies when operating with an out-of-order processor?
After utilizing the scheduler optimally, we can only stall.
TRUE/FALSE: We can do arithmetic and logical operations parallel in Out-of-order frameworks, but we have to commit those operations In-Order to preserve data coherency.
TRUE.
TRUE/FALSE: Unfortunately, using a Scheduler for out-of-order processing adds a ton of overhead due to its O(n log n) schema (as a result of the heaps it uses).
FALSE. Thanks to Tomasulo’s algorithm, we can get a constant rate of regulation.
DEFINE: Write-after-read hazards.
In out-of-order processing, when an earlier instruction reads a value that was overwritten after a later instruction writes it.
DEFINE: Write-after-write hazards.
In out-of-order processors, when a later write instruction is then overwritten by an earlier write instruction.
DEFINE: Read-after-read hazards.
Not a thing. In out-of-order, so long as no write comes in between, reading after a later read isn’t a problem.
How can we resolve write-after-write and write-after-read hazards without stalling in out-of-order processors?
Using Register Renaming, a.k.a. Shadow Registers.
If the named registers are defined by the ISA, which registers are used by register renaming?
Physical Registers. As opposed to the named registers (architectural registers), these are not specified by the ISA.
DEFINE: SIMD.
Single Instruction, Multiple Data. A processor technique that uses a single ISA instruction on many pieces of data. An example is multiplying every value in an array by 2.
Of VLIW and SIMD, which makes a greater impact in today’s processors?
SIMD. It is used in laptops and smartphones to this day. That said, most of the time it must be called explicitly.
What modern capability is SIMD especially good for?
Machine learning, because of the prevalence of matrix operations.