Finals Flashcards
(5pt) Specify the 5-stage pipeline and explain what operation is performed at each stage.
IF (Instruction Fetch): Fetch the instruction from memory.
ID (Instruction Decode): Decode the instruction and read registers.
EX (Execute): Perform the operation or calculate the address.
MEM (Memory Access): Access memory if needed (load/store).
WB (Write Back): Write the result back to the register file.
(5pt) Write the output value (C) for the given circuits if A=1 and B=1.
1
(5pt) Specify three hazards in processor pipelining and explain hardware and software approaches to solve the hazards.
- Structural Hazards
Explanation: Occur when multiple instructions need the same resource simultaneously.
Hardware Approaches:
Resource Duplication: Adding more instances of the resource (e.g., multiple ALUs).
Pipeline Stalling: Introducing delays (bubbles) to wait for resources.
Software Approaches:
Instruction Scheduling: Rearranging instructions to avoid conflicts.
- Data Hazards
Explanation: Happen when an instruction depends on the result of a previous one that hasn’t completed. Types include RAW (Read After Write), WAR (Write After Read), and WAW (Write After Write).
Hardware Approaches:
Forwarding (Bypassing): Sending results directly to dependent instructions.
Pipeline Interlocks: Adding stalls until the data is ready.
Software Approaches:
Instruction Scheduling: Reordering instructions to prevent dependencies.
Register Renaming: Using different physical registers to avoid conflicts.
- Control Hazards
Explanation: Occur with branch instructions, causing the pipeline to fetch the wrong instructions.
Hardware Approaches:
Branch Prediction: Guessing branch outcomes to pre-fetch instructions.
Branch Target Buffer (BTB): Storing target addresses of branches.
Delayed Branching: Placing useful instructions after branches.
Software Approaches:
Branch Prediction Hints: Compiler hints based on likely branch outcomes.
Loop Unrolling: Reducing branch frequency by expanding loops.
- (30pt) The following figure shows single cycle implementation for datapath and control path for MIPS architecture.
Fill 0, 1 or X(don’t care) in the following table to implement for the architecture.
0 0 1 0 0 0
1 1 1 1 0 0
1 X 0 0 1 0
0 X 0 0 0 1
1 0 1 0 0 0
[10pt] Consider question 6. We want to construct a control unit which generates the table output in question 6. Note that ALU Src denotes the highest bit (5), and Brach represents the lowest bit (0). For the control unit, we use a ROM. Note that the op code of “lw” is 100011, and that of “sw” is 101011. Specify what is the proper address/value pair for “lw” and “sw” in ROM?
ROM Entries:
Address (Opcode) | Value (Control Signals)
—————–|————————-
100011 | 111110
101011 | 100010
- Assume that the following code is executed on a 5-stage pipelined datapath:
add $5,$2,$1
lw $3,4($5)
lw $2,0($2) or $3,$5,$3 sw $3,0($5)
9.1. (5pt) If there is no forwarding or hazard detection, insert nops to ensure correct execution. (Note that you need to insert the bubbles using nops.)
9.2. (5pt) Repeat the above question but now use nops only when a hazard cannot be avoided by
changing or rearranging these instructions. You can assume register $7 can be used to hold temporary values in your modified code. (Note that whenever bubbles are required, you need to insert nops.)
9.1
add $5, $2, $1
nop
nop
lw $3, 4($5)
nop
nop
or $3, $5, $3
nop
nop
sw $3, 0($5)
9.2
add $5, $2, $1
lw $2, 0($2)
nop
lw $3, 4($5)
or $3, $5, $3
sw $3, 0($5)
Consider the following repeating branch outcome patterns.
a. T, T, T, T, T, T, T, T, NT, NT
b. T, T, T, T, NT
8.1. (5pt) What is the accuracies of the always-taken and the always-not-taken predictors for each pattern if these patterns are repeated forever?
8.2. (5pt) What are the accuracies of the one-bit and the two-bit predictors for each pattern if these patterns are repeated forever?
Pattern a: T, T, T, T, T, T, T, T, NT, NT
Always-Taken Predictor: 80%
Always-Not-Taken Predictor: 20%
One-Bit Predictor: 90%
Two-Bit Predictor: 80%
Pattern b: T, T, T, T, NT
Always-Taken Predictor: 80%
Always-Not-Taken Predictor: 20%
One-Bit Predictor: 80%
Two-Bit Predictor: 80%
- The processor company has designed a new 10 stage pipeline (shown below). The new pipeline is designed to perform better than the standard 5-stage pipeline. The useful results for the purpose of forwarding are available at the last step in each stage (F2, R1, X3, M3, W1). For example the ALU result is not available until the end of the X2 stage.
F1 F2 R1 X1 X2 X3 M1 M2 M3 W1
10.1 (5pt) Is it possible to eliminate data hazards between R-type instructions in this pipeline by using forwarding? Why or why not?
10.2 (10pt) In the code fragments below, identify the data hazards and describe the number of pipeline stalls (or bubbles) required even with your full bypassing.
(a) mul $4, $5, $6 add $8,$9,$4
(b) lw $1, 8($2) add $3,$1,$4
10.3 (10pt) Suppose that branches are predicted as not-taken until the actual branch decision is known. Also, suppose that the branch target is known at the end of the X1 stage. How many in-progress instructions must be flushed (invalidated) when a branch is taken? (Or how many bubbles are required for branch mis-prediction?) Explain the reason.
10.1: In a 10-stage pipeline, data hazards between R-type instructions can be eliminated by using forwarding (bypassing). This is because the pipeline provides forwarding points at the end of each stage (F2, R1, X3, M3, W1), allowing results to be forwarded directly to dependent instructions without waiting for them to be written back to registers. Thus, forwarding effectively eliminates data hazards between R-type instructions in this pipeline.
10.2:
a) mul $4, $5, $6
add $8, $9, $4
b) lw $1, 8($2)
add $3, $1, $4
10.3: When a branch is mis-predicted as not-taken and later determined to be taken in a 10-stage pipeline:
Flushing Instructions: All instructions fetched and in progress after the branch instruction must be flushed (invalidated).
Reason: Instructions fetched after the branch were based on the incorrect prediction (not-taken). Flushing ensures that the correct instructions are fetched and executed starting from the correct target address after the branch decision is known.
Number of Bubbles (Stalls): The exact number of stalls (bubbles) required depends on how quickly the pipeline can recover from a mis-prediction. Typically, instructions from X2 stage onwards up to W1 stage would need to be flushed, introducing several cycles of stalls to re-synchronize the pipeline.