Week 10 & 11 - CPU Performance & Design and Pipelining Flashcards
What are the 2 types of CPU?
CPU: Central Processing Unit (very fast, 3GHz, only 8 cores)
GPU: Graphical Processing Unit (not so fast, <1GHz, many cores (100+))
CPUs are latency-focused, and GPUs are throughput-focused.
CPU software is serial, GPU software is parallel.
What is response time and throughput?
How long it takes to do a task (duh), and total work done per unit of time (eg. tasks per hour)
lab exam question
What is the formula for performance?
Example: 10s on A, 15s on B
Performance = 1/Execution Time
PerformanceA/PerformanceB =
Execution TimeB/Execution TimeA
A is then 1.5 times faster than B.
What are elapsed time and CPU time?
Elapsed time = response time - total response time, including all aspects (processing, I/O, OS overhead, idle time), this one determines system perfomance
CPU Time - time spent processing a given job (discounts I/O time and other), comprises user CPU time and system CPU time
What does clock rate f depend on?
Clock rate f depends on implementation technology used and CPU organization used.
Clock frequency (rate) f: cycles per second
Clock cycle time => C = 1/f
eg.
f=1GHz
C = 1/f = (1/10^9) sec = 1ns
What is performance improved by?
By reducing the number of clock cycles.
CPUtime = Clock Cycle / Clock Rate
Clock rate must often be traded off against cycle count.
Formula for CPU time?
clock cycle * clock time
clock cyles / clock rate
Icount * CPI * clock time
Icount * CPI / clock rate
or even
Instructions/Program *
ClockCycles / Instruction *
Seconds / ClockCycle
What is a micro-operation?
an elementary hardware operation thay can be carried out in one clock cycle.
register transfer, arithemetic, logic
Instructions can be divided into four classes according to their CPI (classes A, B, C, D). P1 with a clock rate of 2.5Ghz and CPIs of 1, 2, 3, 3 and P2 with a clock rate of 3GHz and CPIs of 2, 2, 2, 2. Instruction count is 1.0E6: 10% A, 20% B, 50% C, 20% D. Which is faster: P1 or P2?
a) global CPI
b) clock cyles for both?
Class X = 1E6 * %
Class A = 1E6 * 0.1 etc
CPU time = Icount * CPI / clockrate
CPU time = SUM (I count class A * CPI) / clockrate
CPU time for P1 = (10^5 * 1 + 2 * 10^5 * 2
5 * 10^5 * 3 + 2 * 10^5 * 3) / 2.5 * 10^9 = 10.4 x 10^(-4) s
a) CPI = CPU time * Clock rate / IC
CPI (P1) = 10.4 * 10^(-4) X 2.5 * 10^9 /
10^6 = 2.6
Reducint power example:
Suppose a new CPU has:
85% of capacitive load of old CPU
15% voltage and 15% frequency reduction
What’s the capacity now?
P new /P old =
(C old x 0.85 x (V old x 0.85)^2 x F old x 0.85)
/ ( C old x V old x F old) = 0.85^4 = 0.52
Formulas for MIPS?
Millions of Instructions per Second
MIPS = Instruction count / Execution time * 1E6 =
Instruction count / (Instruction count x CPI / Clock rate) x 1E6 =
Clock rate / CPI * 1E6
Latency and throughput of a pipeline.
Fetch 100ps, Decode 200ps, Execute 150ps, Memory 350ps, Writeback 50ps
a) For a non-pipelined processor, what is the cycle time, latency and throughput?
b) same for pipelined
a) Cycle Time = 100ps+200ps + 150ps+350ps+50ps = 850ps
Latency = 850ps
Throughput = 1/850ps
b) Cycle Time = max (100, 200, 150, 350, 50) + 20) = 370ps
Latency =370 * 5 ps = 1850 ps
Throughput = 1/370ps
5 stages of pipelining?
Instruction execution in 5 stages:
○ Instruction fetch (IF)
○ Instruction Decode (ID)
○ ALU operation (EX)
○ Memory Access (MEM)
○ Write Back result to register file (WB)
i hope you know how to draw a diagram (k+n-1 cycles)
What hazards are there when it comes to pipelining?
Different instructions need the same resources (structural hazards)
- Different instructions need results from other instructions (data
hazards)
- Different instructions execute depending on other instructions
(branch instructions, control hazards)
The instruction pipeline has the following stages: instruction fetch (IF), instruction decode (ID), operand fetch (OF), perform
operation (PO) and writeback (WB). The IF, ID, OF and WB stages take 1 clock cycle each for every instruction. Consider a sequence of 100 instructions. In the PO stage, 40 instructions take 3 clock cycles each, 35 instructions take 2 clock cycles each, and the remaining 25 instructions take 1 clock cycle each. Assume that there are no data hazards and no control hazards.
The number of clock cycles required for completion of execution of the sequence of instruction is ______.
1 instr: 1 + 1 + 1 + 3 + 1 = 7 clock cycles
39 instr: 39 x 3 cycle = 117
35 instr → 35 x 2 cycle = 70
25 instr → 25 x 1 cycle = 25
Total 219 cycles