Architectures - Pipelining Flashcards
What is a pipelined architecture?
An architecture in which more than one instructions are executed in parallel by the processor.
Every clock cycle an instruction begins a new task corresponding to one of the five stages that are part of the instruction. The instructions being executed in parallel, whatever their current stage is, have to be synchronized
Which are the stages by which an instruction is composed?
- Instruction Fetch
- Instruction Decode
- Execute
- Memory
- Write Back
What is the time of machine cycle?
the time required to the processor to execute one stage. It is determined by the slowest stage
What is the CPI?
clock cycle per instructions: number of clock cycle required to finish an instruction
What is the throughput? How are the un-pipelined throughput and the pipelined one related?
throughput = # of instructions that exit the pipeline in the time unit
Throughput (ideal pipeline) = Throughput (un-pipeline) * # of stages
The time required to execute a step is higher in a pipelined architecture but the overall throughput is lower.
t is higher because due to the pipeline control overheads
How many clock cycles does an instruction require?
5
4 only for branch instructions
What do we need at level memory to implement a pipeline? Why?
To implement a pipeline we need to work on different instructions at the same time being sure to not “break” the memory coherence so we need new structures:
4 pipeline registers:
1. IF - ID
2. ID - EX
3. EX - MEM
4. MEM - WB
What is an hazard? Which types of hazards are there?
An hazard identifies a situation in which the processor cannot complete an instruction execution in the designed time
- Structural
- Data
- Control
Structural hazards: characteristics, example, solutions
Resource conflicts
There is a structural hazard when the processor cannot complete the operation of a certain stage in one clock cycle.
Examples:
- there is just one register-file write port but there are cycles in which two registers writes are required
- there is a single-port memory but different instructions try to access the memory together
Sol:
- adding new hardware or improving the existing one
-> a trade-pff between the performance and the cost is required based on the hazards frequency
Data hazards: characteristics, example, solutions
An instruction depends on the result of a previous one and they are close enough -> OVERLAPPING -> the execution in the pipeline could change the order of the read and write operations and this could lead to wrong results o indeterminists behaviour
This can happen both for register operands and memory ones.
In case of memory operands a data hazard ca happen if a store and a load are not made in the same stage or if the execution can go on even if an instruction is waiting for a cache miss to be solved.
Example:
ADD R1, r2, r3
SUB r4, R1, r5 -> 2 stages before R1 availability
AND r6, R1, r7 -> 1 stage before R1 availability
OR r8, R1, r9 -> OK before writing is done before reading
XOR -> OK
Sol:
- stalling the instructions requiring some data until they are available
- forwarding (or bypassing) technique
What is the forwarding technique and for what is it used?
This technique requires special hardware that can understand when a previous and a current ALU operations have to use the same register
In such a case the special hardware selects the result directly from the ALU output rather than from the register file
The hardware must:
- forward a data from any pipeline register to any inout of any functional unit
- not forward anything if the next instruction is stalled or an interrupt occurred
Can all hazards be solved using data forwarding?
No! Forwarding cannot be done back in time
What happens when an instruction is stalled?
The following instructions are also stalled.
The previous ones continue.
A bubble is introduced
How do we implement a stall for an instruction in the ID stage?
We put the stall in the EX stage:
- nop instruction: the ID/EX stage is forced to zero
- forcing the pipeline register IF/ID to maintain the same value
- not updating the PC’s value -> same IF
Control hazards: characteristics, example, solutions
They are related to conditional and unconditional branches that could change the PC when the following instructions have already been fetched
Solution (basic): when a branch instruction is detected the pipeline is stalled and this is done but deciding earlier if the branch has to be taken (by moving the comparison unit ahead of one stage) and by computing earlier the new PC value (by adding a new adder to the comparison unit)
What do branches lead to? How can we reduce it?
Performance degradation
Techniques to reduce it:
- freezing the pipeline: stalling the pipeline when a branch instruction is detected -> deciding earlier and updating the PC -> comparison unit move ahead of one stage and adder added to that unit
- predict taken: if the target address is known before the branch outcome it may be possibile to assume the branch as taken
- predict untaken: the branch is assumed to not be taken. If it will be taken the previous operations will be undone. If the branch decision is not taken there are no changes in the pipeline status
- delayed branches: the processor decodes the branch instructions but it does nothing related to it. The processor just fills some “branch-delay” slots with instructions that have to be executed no metter the outcome -> using it only 30% of the branches produce a penalty. Several RISC architectures no long support this technique
How do we manage to execute multi cycle operations?
By modifying the EX stage: it will be composed of different functional units and it can be repeated many times as needed.
It is composed of:
- 1 integer unit
- 7 floating point and integer multiply unit (M1 - M7)
- 4 floating point adders (A1 - A7)
- a floating point or integer divider block
extended structure of the EX stage -> more frequent hazards
operations have longer latency