Single-cycle Datapath and pipelining Flashcards
What are the 5 steps in the datapath when executing an instruction?
1.
Use PC to fetch instruction from instruction memory.
Next PC is also computed as either (PC + 4) or target address
2.
Grab registers from instruction and accesses the register file to read registers that needs to be read
- Instruction is executed.
ALI (arithmetic result, calculate address for load/store, calculate target of branch address)
4.
Access memory (if necessary)
5.
Write result to register file (if there is a result)
What are multiplexers used for?
Can’t join wires together, so combine multiplexers with control signals to determine which input is used
What logical value does Low voltage/ground have?
0
How many bits are there per wire?
One bit per wire
How are multi-bit data encoded?
On multi-wire buses
What are to types of circuit elements used in creating a data path
Combinational element
State (sequential) element
What is an combinational element?
Operate on data
The output is a function of input.
AND, Adder, Multiplexer, ALU
What is an state (sequential) element?
Stores data
Basic unit is a flip flop.
Stores values on its input, and does it on the rising edge of the clock.
It is common to assemble flip flops into registers to be able to store multibit values that has something to do with each other
What does an adder do?
Binary addition of two binary numbers
Y = A + B
How does a multiplexer compute its outout?
Y = S ? I1 : I0
I1: Input 1
I0: Input 0
S: Selector signal
What is a critical path?
The worst case latency between two state elements.
The latency the signal must traverse within a clock cycle.
Avoids for example storing values before the input is comletely ready.
What is the data path?
Elements that process data and addresses in the CPU (Registers, ALUs, mux’s, memories)
How are instructions fetched using a PC?
PC is a 32-bit register (addresses are 32 bit)
Output of PC is addresses in instruction memory
PC output are used to fetch instruction from instruction memory
What happens in parallell during instruction fetch
While the PC is used to fetch the instruction from instruction memory,
The current PC is incremented by 4 to get the address of the next instruction.
How are ALU instructions executed?
- Need to read 2 register operands
- Perform the operation
- Write result
Components: Register file, ALU
What are the components of a register file?
2 read ports:
- both 5 bit wide (instruction format have 5 bits for specifying registers)
2 output (ports?) that go into the input of ALU
write input port: write registers
write data input port: to write data
RegWrite signal: Makes sure we don’t write register unless we need to.
What are the components of the ALU?
Two inputs
Input signal: Tells which operation to perform (4 bit gives 16 possible ops, dependent on architecture)
Results
Zero bit: Not used for arithmetic, but for branching
What is needed to be able to execute load/store instructions?
- Read register operands
2.
Calculate address (ALU): Need to sign extend the offset because it is not the full 32 bit (is an immediate supplied in the instruction)
3.
Finish instruction by providing address and the write data (on stores), or grab data (on loads).
2 signals:
MEMRead signal: Tell that we need to read.
MemWrite: tell we need to write
What is needed to support branch instructions?
1.
Read register operands (register file)
- Compare operands (ALU)
3.
Calculate target address
- sign-extend displacement
- Shift left 2 places (because we know we are branching to a 32 bit value)
- Add to PC + 4
How to decide if we should branch
Take values from register and send them to ALU.
Provide ALU with operation signal for subtract.
If result of subtraction is zero, they are equal and we should branch.
What is a single-cycle architecture
An architecture where all of the stages in the datapath is executed within one clock cycle
In a MIPS datapath, what are the ALU used for
Load/store: function signal ‘add’
Branch: function signal ‘sub’
R-type: function depends of funct field in instruction
What is the ALUOp control signal?
Come from main control unit, derived it from opcode.
00: load or store
01: Branch equal
10: R-types
ALU control is set based on ALUOp and funct.
For 00 and 01, it doesn’t matter what the funct field is.
For R-types (add, sub, and, or), the funct field are as defined in instruction
What is the main control unit?
Based on the principle that all control signals are derived from instruction.
Need to select appropriate part of instruction and add them to correct place.
What are examples of control signals
Multiplexer selection, ALUOp, etc.
What are the three instruction types
Branch, load/store, R-type
Explain what happens when an R-type instruction is passed executed (ex.ADD).
PC -> Instruction memory -> Instruction
PC is incremented by 4
Opcode sent to control unit
2 registers being read
1 destination register, based on opcode know this is and R-type instruction, use multiplexer to set write to 1
immediate bits are not used in this case
funct bits are sent to ALU control
ALUOp is set in the main control unit
combine ALUOp with funct to get correct operation in ALU
In control unit, set ALUSrc so that registers are used as input to ALU.
Control unit disable MemRead and MemWrite
ALU output uses multiplexer to bypass memory - no load/store
Provide output to writeData in register file
RegWrite is set in control unit
What are some issues with a single cycle processor?
Longest delay determines clock period (critical path = load)
No way to vary period of different instructions as all take the same amount of time. Violates the design principle of “Making the common case fast”
What is pipelining?
Overlap execution of multiple instructions.
Partition instruction into n stages. Needs pipeline registers to be able to do this. Uses these registers as input to next stage, so that a new instruction can start execution completely independently.
Describe the RISC-V datapath including pipeline registers
PC goes to instruction memory.
Put instruction into pipeline register
Decode instruction and go to registers. Output of registers is stored in pipline registers.
Execute ALU and branching and store all of the results in pipeline registers.
At memory stage, write to normal registers happen.
Name some pipeline stages
IP: Instruction fetch
ID: Instruction decode
EX: Execute, treats ALU, load and branch instructions differently. Branch is completed after EX stage
MEM: Memory
WB: Write back
What are the three types of dependencies?
Data dependences through memory
data dependences through registers
control dependences
What is a hazard?
Not respecting a program dependence.
This result in wrongful program execution.
Name the three data dependences
Real dependence: read-after-write (RAW)
Anti dependence: write-after-read (WAR), use data as input in one instruction and then it gets overwritten in the next.
Output dependence: write-after-write (WAW), write to same register
What does it mean to stall a pipeline?
Instruction stalls (does not progress to next stage) until required data is made available (dependency is satisfies).
What is forwarding?
Mitigates data dependences without the need to stall.
Result-data is forwarded to previous stages the next cycles, making it available to instructions before the results are written back to registers.
The instructions in the different stages will check where the most updated data comes from, and chooses this as source for the operation
What type of hazard can happen due to a control dependence?
When branching, might not know what the next instruction will be before the branch-criteria is fulfilled.
What is one sollution to control hazards?
Branch predition and speculative execution. If prediction was wrong - flush out instructions along the wrong execution path.
What happens when a branch is mispredicted.
Pipeline starts executing instruction on predicted path.
When first instruction reaches MEM stage - finds out path is incorrect.
needs to flush pipeline and make sure there are no architectural consequences. reset pipeline registers and then fetch instruction of the correct path.
What happens when you have deeper pipelines
Higher clock frequency
But higher cost due to mispredictions -> higher CPI
Power consumption increases with frequency - also a cost with deeper pipelines.