Datapath And Pipelining Flashcards

1
Q

Steps for instruction execution

A

From PC number, go to instruction memory, fetch instruction.

Then decode the instruction and corresponding register number.

For load/store may need to calculate memory address.

ALU used to calculate:

  • Arithmetic Result
  • Memory Address for load/store
  • Branch Address

Then access data memory for load/store

Then update PC to PC +4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe Register Design

A
  • Register: stores data in a circuit:
  • Uses a clock signal to determine when to update the stored value;
  • Edge-triggered: update when Clk changes from 0 to 1.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe Register with Write Control

A
  • Register with write control:
  • Only updates on clock edge when the write control input is 1;
  • Used when stored value is required later.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe Clocking Methodology

A

Combinational logic transforms data during clock cycles:
• Between clock edges;
• Input from state elements, output to state element;
• State elements are latches or registers;
• Longest delay determines clock period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define a datapath

A

Elements that process data and addresses in the CPU.
They are registers, ALUs, muxes, memories, …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Steps for R-format Instruction

A
  • Read two register operands.
  • Perform arithmetic/logical operation.
  • Write the result back to register.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Steps for Load/Store Instructions

A
  • Read register operands.
  • Calculate address using 16-bit offset:
  • Use ALU, but sign-extend offset.
  • Load: Read memory and update register.
  • Store: Write register value to memory.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Steps For Branch Instructions

A

Read register operands.
• Compare operands:
• Use ALU, subtract and
check Zero output.
• Calculate target address:
• Sign-extend displacement;
• Shift left 2 places (word
displacement);
• Add to PC + 4…
• Already calculated by
instruction fetch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Draw Full Datapath without Pipelining

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe ALU Control

A
  • Assume 2-bit ALUOp derived from opcode:
  • Combinational logic derives ALU control.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe Opcodes For instructions

A

Control Signals Derived From Instructions

R-Type = 0

Load/Store = 35, 43

Branch Instructions = 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Steps For implementing Jumps in Datapath

A
  • Jump uses word address
  • Update PC with the concatenation of
  • Top 4 bits of old PC, and
  • 26-bit jump address, and
  • 00.
  • Need an extra control signal decoded from the opcode.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the five processing stages?

A
  • *Inst. Fetch (IF):** Read next instruction from memory (pointed to by PC) Store in Instruction Register (IR).
  • *Inst. Decode (ID):** Set up control signals inc. register addresses, ALU function.
  • *Execute (EX):** Use ALU to compute result or address.
  • *Memory (MEM):** Access memory (read or write), if applicable.
  • *Write Back (WB):** Write the result of the function back to register, if applicable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is the performance of a single cycle processor evaluated?

A
  • Maximum clock determined by the delay of the longest path.
  • Each class of instructions uses different components of the datapath.

To improve performance, load instruction must be implemented as fast as possible as the longest path.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Performance of Multicycle Datapath

A
  • Split into 5 stages, same clock for each.
  • Clock determined by longest action (not longest instruction).
  • As we can skip non-used stages, performance is accrued as different instruction types take different numbers of clock cycles:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Clocking Methodology Of MultiCycle

A
  • Use single clock, all data written on the rising edge:
  • Split execution into phases, each phase is one clock cycle.
  • Different instructions use different phases.
  • Use additional registers to hold temporary results across clock cycles.
  • We assume one clock cycle allows completion of:
  • Register file access (read or write);
  • ALU operation
  • Memory access.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Describe Multicycle Datapath

A
  • PC, IR, Register file only written when enabled.
  • Temporary registers, A, B, ALUOut written on every clock cycle.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Describe Pipelined Processor

A
  • Can have up to one instruction per phase.
  • Cannot skip phases – instructions advance along pipeline.
  • Improves throughput of system, but individual instructions execute no faster.
  • There are complications with shared resources, dependencies, decisions…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How does a pipelined processor improve performance

A

parallelism improves performance as the processor can simultaneously execute different stages allowing faster execution. In summary, parallelism allows the processor to perform several computations at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What has most recently improved performance in computer architecture?

A

Although increases in clock speed has provided benefits, the greatest improvements have been due to better pipelining and multithreadiing.

21
Q

What are the basic principles of processing?

A

Sequential processing
• Each instruction executes completely before next instruction starts
Pipeline processing
• Instruction i + 1 starts executing before instruction i finishes
• Overlapped execution
• Maximum number of instructions executing simultaneously = Number of pipeline stages
• Increases performance by increasing throughput
• individual instructions not executed faster
• Ideal case: no idle components
• Significant issue: hardware in each stage must be independent, so can no longer share
hardware between stages

22
Q

Is an efficient multicycle datapath suitable for pipelining?

A

This shares many components to try and maximise the efficiency i.e. ALU is used
for next instruction calculations… Shared resources, not suitable for pipelining!

23
Q

How are pipelining implemented in single cycle datapaths?

A
  • Need registers between stages
  • To hold information produced in the previous cycle, allowing it to ‘ripple’ through the pipeline.
24
Q

How does pipelined speedup work?

A

• If all stages are balanced
• i.e., all take the same time (very unlikely!)
• Time between instructions pipelined = Time between instructions non-pipelined
Number of stages
• If not balanced, the speedup is less!
• Overall speedup due to increased throughput
• Latency (time for each instruction) does not decrease (normally increases).

25
Q

Factors Limiting Performance

A
  • Theoretical speedup of ‘n’ not delivered in practice:
  • Not all instructions require all pipeline stages.
  • Overheads of filling/emptying pipeline.
  • Dependencies between instructions.
  • Have to set the clock for the slowest stage in the pipeline.

• Note: instructions cannot ‘skip ahead’ in the pipeline! Must go through every stage.

26
Q

How are control modules implementing in pipelined processors?

A
  • Control signals derived from instruction:
  • As in single-cycle implementation, but
  • Pass control signals along just like the data.
27
Q

What are structure hazards?

A

A structure hazard refers to when a required resource is busy.

28
Q

What is a hazard in computing?

A

Situations that prevent starting the next instruction in the next cycle.

29
Q

What is a data hazard?

A

Need to wait for previous instruction to complete its data read/write.

30
Q

What is a control hazard?

A

Deciding on control action depends on previous instruction…

31
Q

What is the effect of hazards?

A

Hazards may require us to stall the pipeline, effectively wasting a cycle also known as a “bubble”.

32
Q

How do you stall a pipeline?

A

Force control values in ID/EX register to 0:

  • EX, MEM and WB do nop (no-operation).

Prevent update of PC and IF/ID register:

  • The current instruction is decoded again
  • Following instruction is fetched again;
  • 1-cycle stall allows MEM to read data for lw
    • Can subsequently forward to EX stage.
33
Q

Why are structure hazards a problem and how are they dealt with?

A

A structure hazard refers to conflict for use of resource
• In a constrained MIPS pipeline with a single memory for both data/instructions:
• Load/store requires data access;
• Instruction fetch would have to stall for that cycle…
• Would cause a pipeline “bubble”.
• Hence, pipelined datapaths require separate instruction/data memories.
• Or, at least, separate instruction/data caches.

34
Q

How are data hazards solved?

A

Data hazards are when an instruction depends on the completion of data access by a previous instruction

One method of solving this is forwarding, this is to

  • Use result when it is computed
  • Don’t wait for it to be stored in a register;
  • Requires extra connections in the datapath.
35
Q

When can forwarding not be used?

A
  • If value not computed when needed…
  • We can’t forward, backwards in time!
36
Q

When should you forward?

A

Result available at end of EX phase (in pipeline register)…
• No need to wait until WB phase;
• Add extra logic to feed this result back to ALU input;
• If hazard detected, logic selects forwarded ALU input, discarding erroneous register value.
Can also forward result from MEM stage:
• Either value fetched from memory, or
• ALU result passed on from previous clock cycle.

37
Q

When to detect the need to forward?

A

MEM hazard

• if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
• if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01

• EX hazard
• if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
ForwardA = 10
• if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
ForwardB = 10

38
Q

How to detect load-use hazard detection?

A

Check when using instruction is decoded in ID stage
ALU operand register numbers in ID stage are given by
• IF/ID.RegisterRs, IF/ID.RegisterRt

Load-use hazard when
• ID/EX.MemRead and
((ID/EX.RegisterRt = IF/ID.RegisterRs) or
(ID/EX.RegisterRt = IF/ID.RegisterRt))
If detected, stall and insert bubble

39
Q

Describe Datapath with Hazard Detection

A
40
Q

Can all stalls be avoided?

A

Not all stalls can be avoided as sometimes they are required for correct results, so to solve this the compiler can arrange code to avoid this issue.

41
Q

What is code scheduling to avoid stalls?

A

Reorder code to avoid the use of load result in the next instruction

42
Q

What are control hazards?

A

Branch determines the flow of control
• Fetching next instruction depends on branch outcome;
• Pipeline won’t always fetch correct instruction
• Still working on ID stage of branch…
In MIPS pipeline
• Need to compare registers and compute target early in the pipeline;
• Add hardware to do it in the ID stage.

43
Q

How are hazards due to branches dealt with?

A
  • Remove branch hazard by redefining semantics of branch instructions:
  • instruction immediately following branch instruction (in the “branch delay slot”) is always executed;
  • rely on the compiler to fill the slot with something useful (or nop if it can’t).

Branch Prediction

Alternatively, compiler can predict the result of the branch.

• Longer pipelines can’t readily determine branch outcome early:
• Stall penalty becomes unacceptable.
Predict outcome of branch:
• Only stall if the prediction is wrong.
In MIPS pipeline:
• Can predict branches not taken;
• Fetch instruction after branch, with no delay.

44
Q

Describe Branch Prediction in more detail

A

Static branch prediction

  • Based on typical branch behaviour
  • Example: loop and if-statement branches
    • Predict backward branches taken
    • Predict forward branches not taken

Dynamic branch prediction

  • Hardware measures actual branch behaviour
    • e.g., record recent history of each branch
  • Assume future behavior will continue the trend
    • When wrong, stall while re-fetching, and update history
45
Q

What is the effect of branch delays on superscalar and deeper pipelines?

A

Here, branch penalty is more significant.

Use dynamic prediction:

  • Branch prediction buffer (aka branch history table);
  • Indexed by recent branch instruction addresses;
  • Stores outcome (taken/not taken).
  • To execute a branch:
    • Check table, expect the same outcome;
    • Start fetching from fall-through or target;
    • If wrong, flush pipeline and flip prediction.
46
Q

Describe 1-bit Predictor?

A
  • Inner loop branches mispredicted twice!
  • Mispredict as taken on the last iteration of the inner loop
  • Then mispredict as not taken on the first iteration of the inner loop next time around
47
Q

Describe 2-bit Predictors

A

Only change prediction on two successive mispredictions

48
Q

Define Superscalar architecture

A

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor.

49
Q
A