export_week 10 chapter 8 cpu and memory design enhancement and implementation Flashcards

1
Q

; tab ,

▪ Current CPU Architecture Designs:

A

▪ Traditional modern architectures

▪ VLIW (Transmeta) – Very Long Instruction Word

▪ EPIC (Intel) – Explicitly Parallel Instruction Computer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

; tab ,

Current CPU Architectures:

A

* IBM Mainframe series * Intel x86 family * IBM POWER/PowerPC family * Sun SPARC family

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

; tab ,

Problems with early CPU Architectures and solutions:

A

▪ Large number of specialized instructions were rarely used but added hardware complexity and slowed down other instructions ▪ Slow data memory accesses could be reduced by increasing the number of general purpose registers ▪ Using general registers to hold addresses could reduce the number of addressing modes and simplify architecture design

▪ Fixed-length, fixed format instruction words would allow instructions to be fetched and decoded independently and in parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

; tab ,

how VLIW Architecture?

A

▪ Transmeta Crusoe CPU ▪ 128-bit instruction bundle = molecule ▪ Four 32-bit atoms (atom = instruction) ▪ Parallel processing of 4 instructions ▪ 64 general purpose registers ▪ Code morphing layer ▪ Translates instructions written for other CPUs into molecules ▪ Instructions are not written directly for the Crusoe CPU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

; tab ,

EPIC Architecture

A

▪ 128-bit instruction bundle ▪ 3 41-bit instructions ▪ 5 bits to identify type of instructions in bundle * 128 64-bit general purpose registers * 128 82-bit floating point registers * Intel X86 instruction set included * Programmers and compilers follow guidelines to ensure parallel execution of instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

; tab ,

Fetch-Execute Cycle Timing Issues?

A

▪ Computer clock is used for timing purposes for each step of the instruction cycle ▪ GHz (gighertz) – billion steps per second ▪ Instructions can (and often) take more than one step ▪ Data word width can require multiple steps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

; tab ,

CPU Features and Enhancements

A

Separate Fetch/Execute Units Pipelining Multiple, Parallel Execution Units Scalar Processing Superscalar Processing Branch Instruction Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

; tab ,

what include fetch unit ?

A

▪ Instruction fetch unit ▪ Instruction decode unit Determine opcode Identify type of instruction and operands ▪ Several instructions are fetched in parallel and held in a buffer until decoded and executed ▪ IP – Instruction Pointer register holds instruction location of current being processed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

; tab ,

what is including in the Execute Unit?

A

▪ Receives instructions from the decode unit ▪ Appropriate execution unit services the instruction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

; tab ,

define Instruction Pipelining ?

A

▪ Assembly-line technique to allow overlapping between fetch-execute cycles of sequences of instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

; tab ,

define Scalar processing ?

A

Average instruction execution is approximately equal to the clock speed of the CPU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

; tab ,

Branch Problem Solutions

A

▪ Separate pipelines for both possibilities ▪ Probabilistic approach ▪ Requiring the following instruction to not be dependent on the branch ▪ Instruction Reordering (superscalar processing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

; tab ,

Multiple, Parallel Execution Units what is it ?

A

▪ Different instructions have different numbers of steps in their cycle ▪ Differences in each step ▪ Each execution unit is optimized for one general type of instruction ▪ Multiple execution units permit simultaneous execution of several instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

; tab ,

talk about Superscalar Processing?

A

▪ Process more than one instruction per clock cycle

▪ Separate fetch and execute cycles as much as possible

▪ Buffers for fetch and decode phases

▪ Parallel execution units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

; tab ,

talk about Superscalar Issues

A

▪ Out-of-order processing – dependencies (hazards)

▪ Data dependencies

▪ Branch (flow) dependencies and speculative execution

▪ Parallel speculative execution or branch prediction

▪ Branch History Table

▪ Register access conflicts

▪ Rename or logical registers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

; tab ,

why need Memory Enhancementsand what are they ?

A

Memory is slow compared to CPU processing speeds!
▪ 2Ghz CPU = 1 cycle in ½ of a billionth of a second
▪ 70ns DRAM = 1 access in 70 millionth of a second
▪ Methods to improvement memory accesses:
▪ Wide Path Memory Access
• Retrieve multiple bytes instead of 1 byte at a time
▪ Memory Interleaving
• Partition memory into subsections, each with its own address register and data register
▪ Cache Memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

; tab ,

Cache Memory

A

▪ Blocks: 8 or 16 bytes ▪ Tags: pointer to location in main memory ▪ Cache controller ▪ hardware that checks tags ▪ Cache Line ▪ Unit of transfer between storage and cache memory ▪ Hit Ratio: ratio of hits out of total requests ▪ Synchronizing cache and memory ▪ Write through ▪ Write back

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

; tab ,

Step-by-Step Use of Cache1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

; tab ,

Performance Advantages of cache memory?

A

▪ Hit ratios of 90% common ▪ 50%+ improved execution speed ▪ Locality of reference is why caching works ▪ Most memory references confined to small region of memory at any given time ▪ Well-written program in small loop, procedure or function ▪ Data likely in array ▪ Variables stored togeth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

; tab ,

Why do the sizes of the caches have to be different?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

; tab ,

reasons for Multiprocessing?

A

▪ Reasons ▪ Increase the processing power of a system ▪ Parallel processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

; tab ,

Multiprocessor system in Multiprocessing is ?

A

▪ Tightly coupled ▪ Multicore processors - when CPUs are on a single integrated circuit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

; tab ,

what is Multiprocessor Systems for ?

A

▪ Identical access to programs, data, shared memory, I/O, etc. ▪ Easily extends multi-tasking, and redundant program executio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

; tab ,

▪ Two ways to configure Multiprocessor Systems

A

▪ Two ways to configure ▪ Master-slave multiprocessing ▪ Symmetrical multiprocessing (SMP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

; tab ,

Master-Slave Multiprocessing, Master CPU?

A

▪ Manages the system ▪ Controls all resources and scheduling ▪ Assigns tasks to slave CPUs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

; tab ,

aAdvantages of Master-Slave Multiprocessing?

A

Advantages ▪ Simplicity ▪ Protection of system and data

27
Q

; tab ,

▪ Disadvantages of Master-Slave Multiprocessing?

A

▪ Disadvantages ▪ Master CPU becomes a bottleneck ▪ Reliability issues – if master CPU fails entire system fails

28
Q

; tab ,

Symmetrical Multiprocessing

A

▪ Each CPU has equal access to resources ▪ Each CPU determines what to run using a standard algorithm

29
Q

; tab ,

▪ Disadvantages of Symmetrical Multiprocessing

A

▪ Resource conflicts – memory, i/o, etc. ▪ Complex implementation

30
Q

; tab ,

Advantages Symmetrical Multiprocessing ?

A

▪ High reliability ▪ Fault tolerant support is straightforward ▪ Balanced workload

31
Q

; tab ,

General Enhancements – Use RISC-based techniques

A

– Fewer instruction formats, fixed-length → faster decoding – More general purpose registers → fewer memory accesses

32
Q

; tab ,

Clock cycle and instruction cycle

A

– Most instructions take several clock cycles to execute: Fetch the new instruction [IF]. Decode the instruction [ID]. Execute the instruction [EX]. Access memory (if needed) [MEM]. Write back to the registers [WB]

33
Q

; tab ,

Each stage takes a clock cycle, so complete execution takes 5 cycles. Can we do better?

A

Waiting for all five stages of instruction execution to complete is like building something from start to finish

34
Q

; tab ,

Clock cycle and instruction cycle Or can the CPU overlap the execution of several instructions at once because they’re all similar?

A

https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_221-14A61CABCA1789F393E.png

35
Q

; tab ,

– Five stages of instruction execution

A
36
Q

; tab ,

Five stages of instruction execution

A
37
Q

; tab ,

Clock cycle and instruction cycle

A
38
Q

; tab ,

– Problems with pipelining

A

Dependencies (register interlock)—if an instruction needs a result from the immediately preceding instruction, that result won’t be written back until WB, but the result is needed in EX.

39
Q

; tab ,

Problems with pipelining

A

– Branching—when the instruction being executed is a branch, we can’t know if the branch will be taken until after stage 3. But by that time, other instructions are “in flight.”

40
Q

; tab ,

Clock cycle and instruction cycle

A
41
Q

; tab ,

Superscalar Processing

A

RISC and pipelining lets each functional unit in a CPU be fully utilized all of the time. – But, what if there were multiple ALUs or multiple decoders? Then multiple instructions could be executed at once. – Prerequisite: Multiple instructions should be fetched at once via a large path to memory.

42
Q

; tab ,

Superscalar Processing D

A
43
Q

; tab ,

Superscalar Processing DD

A
44
Q

; tab ,

– Problems with superscalar processing

A

– Same general categories as with pipelining: dependencies and branches – Except now forwards, stalls, or canceling may need to be between several functional units! – CPUs become very complex again, yet it is common to have 2 to 4 separate pipelines per core in modern processors.

45
Q

; tab ,

Did you know 1

A

RISC-based CPUs offer general performance enhancements due to simplified formats and single-clock cycle execution.

46
Q

; tab ,

Pipelining allows…..

A

multiple instructions to be in various stages of execution at once.

47
Q

; tab ,

Superscalar processing duplicates……

A

pipelines in a single core to have multiple instructions executing simultaneously.

48
Q

; tab ,

Data dependencies and branches are……?

A

hazards to both pipelining and superscalar architectures.

49
Q

; tab ,

Recall CPU Pipelining

A
50
Q

; tab ,

Three complementary approaches memory ?

A

All three are used simultaneously in the system design

51
Q

; tab ,

Wide path memory access

A
52
Q

; tab ,

Wide path memory access 1

A
53
Q

; tab ,

Wide path memory access 2

A
54
Q

; tab ,

Wide path memory access

A

https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_234-14A61E290734D580E81.png

55
Q

; tab ,

Memory interleaving

A

https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_235-14A61E356D54EB3628B.png

56
Q

; tab ,

Memory interleaving D

A

https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/imagesgm2qx-14A61E3B32D3AF72413.png

57
Q

; tab ,

Cache Memory

A

Use a small amount of expensive SRAM as a buffer against the large amount of DRAM

58
Q

; tab ,

Cache Memory

A

https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/imagekfh3qx-14A61E51C745DDD5B10.png

59
Q

; tab ,

Cache Memory

A

https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_237-14A61E5ABA96F16AC1E.png

60
Q

; tab ,

cache entries consists of ?

A

https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_238-14A61E64D2943E88C7F.png

61
Q

; tab ,

cache replacement algorithm ?

A

https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_239-14A61E6D6156AC4FB57.png

62
Q

; tab ,

cache memory what should happen on memory write ?

A

Cache coherency gets particularly tricky with multiple cores and multiple levels of cache.

63
Q

; tab ,

DID you know

A

https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_241-14A61E86E6900A46784.png