export_week 10 chapter 8 cpu and memory design enhancement and implementation Flashcards

1
Q

; tab ,

▪ Current CPU Architecture Designs:

A

▪ Traditional modern architectures

▪ VLIW (Transmeta) – Very Long Instruction Word

▪ EPIC (Intel) – Explicitly Parallel Instruction Computer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

; tab ,

Current CPU Architectures:

A

* IBM Mainframe series * Intel x86 family * IBM POWER/PowerPC family * Sun SPARC family

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

; tab ,

Problems with early CPU Architectures and solutions:

A

▪ Large number of specialized instructions were rarely used but added hardware complexity and slowed down other instructions ▪ Slow data memory accesses could be reduced by increasing the number of general purpose registers ▪ Using general registers to hold addresses could reduce the number of addressing modes and simplify architecture design

▪ Fixed-length, fixed format instruction words would allow instructions to be fetched and decoded independently and in parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

; tab ,

how VLIW Architecture?

A

▪ Transmeta Crusoe CPU ▪ 128-bit instruction bundle = molecule ▪ Four 32-bit atoms (atom = instruction) ▪ Parallel processing of 4 instructions ▪ 64 general purpose registers ▪ Code morphing layer ▪ Translates instructions written for other CPUs into molecules ▪ Instructions are not written directly for the Crusoe CPU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

; tab ,

EPIC Architecture

A

▪ 128-bit instruction bundle ▪ 3 41-bit instructions ▪ 5 bits to identify type of instructions in bundle * 128 64-bit general purpose registers * 128 82-bit floating point registers * Intel X86 instruction set included * Programmers and compilers follow guidelines to ensure parallel execution of instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

; tab ,

Fetch-Execute Cycle Timing Issues?

A

▪ Computer clock is used for timing purposes for each step of the instruction cycle ▪ GHz (gighertz) – billion steps per second ▪ Instructions can (and often) take more than one step ▪ Data word width can require multiple steps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

; tab ,

CPU Features and Enhancements

A

Separate Fetch/Execute Units Pipelining Multiple, Parallel Execution Units Scalar Processing Superscalar Processing Branch Instruction Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

; tab ,

what include fetch unit ?

A

▪ Instruction fetch unit ▪ Instruction decode unit Determine opcode Identify type of instruction and operands ▪ Several instructions are fetched in parallel and held in a buffer until decoded and executed ▪ IP – Instruction Pointer register holds instruction location of current being processed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

; tab ,

what is including in the Execute Unit?

A

▪ Receives instructions from the decode unit ▪ Appropriate execution unit services the instruction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

; tab ,

define Instruction Pipelining ?

A

▪ Assembly-line technique to allow overlapping between fetch-execute cycles of sequences of instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

; tab ,

define Scalar processing ?

A

Average instruction execution is approximately equal to the clock speed of the CPU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

; tab ,

Branch Problem Solutions

A

▪ Separate pipelines for both possibilities ▪ Probabilistic approach ▪ Requiring the following instruction to not be dependent on the branch ▪ Instruction Reordering (superscalar processing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

; tab ,

Multiple, Parallel Execution Units what is it ?

A

▪ Different instructions have different numbers of steps in their cycle ▪ Differences in each step ▪ Each execution unit is optimized for one general type of instruction ▪ Multiple execution units permit simultaneous execution of several instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

; tab ,

talk about Superscalar Processing?

A

▪ Process more than one instruction per clock cycle

▪ Separate fetch and execute cycles as much as possible

▪ Buffers for fetch and decode phases

▪ Parallel execution units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

; tab ,

talk about Superscalar Issues

A

▪ Out-of-order processing – dependencies (hazards)

▪ Data dependencies

▪ Branch (flow) dependencies and speculative execution

▪ Parallel speculative execution or branch prediction

▪ Branch History Table

▪ Register access conflicts

▪ Rename or logical registers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

; tab ,

why need Memory Enhancementsand what are they ?

A

Memory is slow compared to CPU processing speeds!
▪ 2Ghz CPU = 1 cycle in ½ of a billionth of a second
▪ 70ns DRAM = 1 access in 70 millionth of a second
▪ Methods to improvement memory accesses:
▪ Wide Path Memory Access
• Retrieve multiple bytes instead of 1 byte at a time
▪ Memory Interleaving
• Partition memory into subsections, each with its own address register and data register
▪ Cache Memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

; tab ,

Cache Memory

A

▪ Blocks: 8 or 16 bytes ▪ Tags: pointer to location in main memory ▪ Cache controller ▪ hardware that checks tags ▪ Cache Line ▪ Unit of transfer between storage and cache memory ▪ Hit Ratio: ratio of hits out of total requests ▪ Synchronizing cache and memory ▪ Write through ▪ Write back

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

; tab ,

Step-by-Step Use of Cache1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

; tab ,

Performance Advantages of cache memory?

A

▪ Hit ratios of 90% common ▪ 50%+ improved execution speed ▪ Locality of reference is why caching works ▪ Most memory references confined to small region of memory at any given time ▪ Well-written program in small loop, procedure or function ▪ Data likely in array ▪ Variables stored togeth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

; tab ,

Why do the sizes of the caches have to be different?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

; tab ,

reasons for Multiprocessing?

A

▪ Reasons ▪ Increase the processing power of a system ▪ Parallel processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

; tab ,

Multiprocessor system in Multiprocessing is ?

A

▪ Tightly coupled ▪ Multicore processors - when CPUs are on a single integrated circuit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

; tab ,

what is Multiprocessor Systems for ?

A

▪ Identical access to programs, data, shared memory, I/O, etc. ▪ Easily extends multi-tasking, and redundant program executio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

; tab ,

▪ Two ways to configure Multiprocessor Systems

A

▪ Two ways to configure ▪ Master-slave multiprocessing ▪ Symmetrical multiprocessing (SMP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
# ; tab , Master-Slave Multiprocessing, Master CPU?
▪ Manages the system ▪ Controls all resources and scheduling ▪ Assigns tasks to slave CPUs
26
# ; tab , aAdvantages of Master-Slave Multiprocessing?
Advantages ▪ Simplicity ▪ Protection of system and data
27
# ; tab , ▪ Disadvantages of Master-Slave Multiprocessing?
▪ Disadvantages ▪ Master CPU becomes a bottleneck ▪ Reliability issues – if master CPU fails entire system fails
28
# ; tab , Symmetrical Multiprocessing
▪ Each CPU has equal access to resources ▪ Each CPU determines what to run using a standard algorithm
29
# ; tab , ▪ Disadvantages of Symmetrical Multiprocessing
▪ Resource conflicts – memory, i/o, etc. ▪ Complex implementation
30
# ; tab , Advantages Symmetrical Multiprocessing ?
▪ High reliability ▪ Fault tolerant support is straightforward ▪ Balanced workload
31
# ; tab , General Enhancements – Use RISC-based techniques
– Fewer instruction formats, fixed-length → faster decoding – More general purpose registers → fewer memory accesses
32
# ; tab , Clock cycle and instruction cycle
– Most instructions take several clock cycles to execute: Fetch the new instruction [IF]. Decode the instruction [ID]. Execute the instruction [EX]. Access memory (if needed) [MEM]. Write back to the registers [WB]
33
# ; tab , Each stage takes a clock cycle, so complete execution takes 5 cycles. Can we do better?
Waiting for all five stages of instruction execution to complete is like building something from start to finish
34
# ; tab , Clock cycle and instruction cycle Or can the CPU overlap the execution of several instructions at once because they’re all similar?
https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection\_221-14A61CABCA1789F393E.png
35
# ; tab , – Five stages of instruction execution
![](https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/imagekqh5qx-14A61CB0B744AF6B2F4.png)
36
# ; tab , Five stages of instruction execution
![](https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_222-14A61CB7A5758786932.png)
37
# ; tab , Clock cycle and instruction cycle
![](https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_223-14A61CBF1695639B190.png)
38
# ; tab , – Problems with pipelining
Dependencies (register interlock)—if an instruction needs a result from the immediately preceding instruction, that result won’t be written back until WB, but the result is needed in EX.
39
# ; tab , Problems with pipelining
– Branching—when the instruction being executed is a branch, we can’t know if the branch will be taken until after stage 3. But by that time, other instructions are “in flight.”
40
# ; tab , Clock cycle and instruction cycle
![](https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_226-14A61CDF94829EB48ED.png)
41
# ; tab , Superscalar Processing
RISC and pipelining lets each functional unit in a CPU be fully utilized all of the time. – But, what if there were multiple ALUs or multiple decoders? Then multiple instructions could be executed at once. – Prerequisite: Multiple instructions should be fetched at once via a large path to memory.
42
# ; tab , Superscalar Processing D
![](https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_227-14A61CEE5785DB511F3.png)
43
# ; tab , Superscalar Processing DD
![](https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_228-14A61CF64A8699CC04A.png)
44
# ; tab , – Problems with superscalar processing
– Same general categories as with pipelining: dependencies and branches – Except now forwards, stalls, or canceling may need to be between several functional units! – CPUs become very complex again, yet it is common to have 2 to 4 separate pipelines per core in modern processors.
45
# ; tab , Did you know 1
RISC-based CPUs offer general performance enhancements due to simplified formats and single-clock cycle execution.
46
# ; tab , Pipelining allows.....
multiple instructions to be in various stages of execution at once.
47
# ; tab , Superscalar processing duplicates......
pipelines in a single core to have multiple instructions executing simultaneously.
48
# ; tab , Data dependencies and branches are......?
hazards to both pipelining and superscalar architectures.
49
# ; tab , Recall CPU Pipelining
![](https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_229-14A61D305A2321FEE0C.png)
50
# ; tab , Three complementary approaches memory ?
All three are used simultaneously in the system design
51
# ; tab , Wide path memory access
![](https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_231-14A61D4F7AD5DBD3F51.png)
52
# ; tab , Wide path memory access 1
![](https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_232-14A61E16BE55635A342.png)
53
# ; tab , Wide path memory access 2
![](https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection_233-14A61E211B84B36A4AF.png)
54
# ; tab , Wide path memory access
https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection\_234-14A61E290734D580E81.png
55
# ; tab , Memory interleaving
https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection\_235-14A61E356D54EB3628B.png
56
# ; tab , Memory interleaving D
https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/imagesgm2qx-14A61E3B32D3AF72413.png
57
# ; tab , Cache Memory
Use a small amount of expensive SRAM as a buffer against the large amount of DRAM
58
# ; tab , Cache Memory
https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/imagekfh3qx-14A61E51C745DDD5B10.png
59
# ; tab , Cache Memory
https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection\_237-14A61E5ABA96F16AC1E.png
60
# ; tab , cache entries consists of ?
https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection\_238-14A61E64D2943E88C7F.png
61
# ; tab , cache replacement algorithm ?
https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection\_239-14A61E6D6156AC4FB57.png
62
# ; tab , cache memory what should happen on memory write ?
Cache coherency gets particularly tricky with multiple cores and multiple levels of cache.
63
# ; tab , DID you know
https://s3.amazonaws.com/classconnection/655/flashcards/7082655/png/selection\_241-14A61E86E6900A46784.png