Datapath And Pipelining Flashcards
Steps for instruction execution
From PC number, go to instruction memory, fetch instruction.
Then decode the instruction and corresponding register number.
For load/store may need to calculate memory address.
ALU used to calculate:
- Arithmetic Result
- Memory Address for load/store
- Branch Address
Then access data memory for load/store
Then update PC to PC +4
Describe Register Design
- Register: stores data in a circuit:
- Uses a clock signal to determine when to update the stored value;
- Edge-triggered: update when Clk changes from 0 to 1.
Describe Register with Write Control
- Register with write control:
- Only updates on clock edge when the write control input is 1;
- Used when stored value is required later.
Describe Clocking Methodology
Combinational logic transforms data during clock cycles:
• Between clock edges;
• Input from state elements, output to state element;
• State elements are latches or registers;
• Longest delay determines clock period.
Define a datapath
Elements that process data and addresses in the CPU.
They are registers, ALUs, muxes, memories, …
Steps for R-format Instruction
- Read two register operands.
- Perform arithmetic/logical operation.
- Write the result back to register.
Steps for Load/Store Instructions
- Read register operands.
- Calculate address using 16-bit offset:
- Use ALU, but sign-extend offset.
- Load: Read memory and update register.
- Store: Write register value to memory.
Steps For Branch Instructions
Read register operands.
• Compare operands:
• Use ALU, subtract and
check Zero output.
• Calculate target address:
• Sign-extend displacement;
• Shift left 2 places (word
displacement);
• Add to PC + 4…
• Already calculated by
instruction fetch.
Draw Full Datapath without Pipelining
Describe ALU Control
- Assume 2-bit ALUOp derived from opcode:
- Combinational logic derives ALU control.
Describe Opcodes For instructions
Control Signals Derived From Instructions
R-Type = 0
Load/Store = 35, 43
Branch Instructions = 4
Steps For implementing Jumps in Datapath
- Jump uses word address
- Update PC with the concatenation of
- Top 4 bits of old PC, and
- 26-bit jump address, and
- 00.
- Need an extra control signal decoded from the opcode.
What are the five processing stages?
- *Inst. Fetch (IF):** Read next instruction from memory (pointed to by PC) Store in Instruction Register (IR).
- *Inst. Decode (ID):** Set up control signals inc. register addresses, ALU function.
- *Execute (EX):** Use ALU to compute result or address.
- *Memory (MEM):** Access memory (read or write), if applicable.
- *Write Back (WB):** Write the result of the function back to register, if applicable.
How is the performance of a single cycle processor evaluated?
- Maximum clock determined by the delay of the longest path.
- Each class of instructions uses different components of the datapath.
To improve performance, load instruction must be implemented as fast as possible as the longest path.
Performance of Multicycle Datapath
- Split into 5 stages, same clock for each.
- Clock determined by longest action (not longest instruction).
- As we can skip non-used stages, performance is accrued as different instruction types take different numbers of clock cycles:
Clocking Methodology Of MultiCycle
- Use single clock, all data written on the rising edge:
- Split execution into phases, each phase is one clock cycle.
- Different instructions use different phases.
- Use additional registers to hold temporary results across clock cycles.
- We assume one clock cycle allows completion of:
- Register file access (read or write);
- ALU operation
- Memory access.
Describe Multicycle Datapath
- PC, IR, Register file only written when enabled.
- Temporary registers, A, B, ALUOut written on every clock cycle.
Describe Pipelined Processor
- Can have up to one instruction per phase.
- Cannot skip phases – instructions advance along pipeline.
- Improves throughput of system, but individual instructions execute no faster.
- There are complications with shared resources, dependencies, decisions…
How does a pipelined processor improve performance
parallelism improves performance as the processor can simultaneously execute different stages allowing faster execution. In summary, parallelism allows the processor to perform several computations at the same time.