export_week 10 chapter 8 cpu and memory design enhancement and implementation Flashcards
; tab ,
▪ Current CPU Architecture Designs:
▪ Traditional modern architectures
▪ VLIW (Transmeta) – Very Long Instruction Word
▪ EPIC (Intel) – Explicitly Parallel Instruction Computer
; tab ,
Current CPU Architectures:
* IBM Mainframe series * Intel x86 family * IBM POWER/PowerPC family * Sun SPARC family
; tab ,
Problems with early CPU Architectures and solutions:
▪ Large number of specialized instructions were rarely used but added hardware complexity and slowed down other instructions ▪ Slow data memory accesses could be reduced by increasing the number of general purpose registers ▪ Using general registers to hold addresses could reduce the number of addressing modes and simplify architecture design
▪ Fixed-length, fixed format instruction words would allow instructions to be fetched and decoded independently and in parallel
; tab ,
how VLIW Architecture?
▪ Transmeta Crusoe CPU ▪ 128-bit instruction bundle = molecule ▪ Four 32-bit atoms (atom = instruction) ▪ Parallel processing of 4 instructions ▪ 64 general purpose registers ▪ Code morphing layer ▪ Translates instructions written for other CPUs into molecules ▪ Instructions are not written directly for the Crusoe CPU
; tab ,
EPIC Architecture
▪ 128-bit instruction bundle ▪ 3 41-bit instructions ▪ 5 bits to identify type of instructions in bundle * 128 64-bit general purpose registers * 128 82-bit floating point registers * Intel X86 instruction set included * Programmers and compilers follow guidelines to ensure parallel execution of instructions
; tab ,
Fetch-Execute Cycle Timing Issues?
▪ Computer clock is used for timing purposes for each step of the instruction cycle ▪ GHz (gighertz) – billion steps per second ▪ Instructions can (and often) take more than one step ▪ Data word width can require multiple steps
; tab ,
CPU Features and Enhancements
Separate Fetch/Execute Units Pipelining Multiple, Parallel Execution Units Scalar Processing Superscalar Processing Branch Instruction Processing
; tab ,
what include fetch unit ?
▪ Instruction fetch unit ▪ Instruction decode unit Determine opcode Identify type of instruction and operands ▪ Several instructions are fetched in parallel and held in a buffer until decoded and executed ▪ IP – Instruction Pointer register holds instruction location of current being processed
; tab ,
what is including in the Execute Unit?
▪ Receives instructions from the decode unit ▪ Appropriate execution unit services the instruction
; tab ,
define Instruction Pipelining ?
▪ Assembly-line technique to allow overlapping between fetch-execute cycles of sequences of instructions
; tab ,
define Scalar processing ?
Average instruction execution is approximately equal to the clock speed of the CPU
; tab ,
Branch Problem Solutions
▪ Separate pipelines for both possibilities ▪ Probabilistic approach ▪ Requiring the following instruction to not be dependent on the branch ▪ Instruction Reordering (superscalar processing)
; tab ,
Multiple, Parallel Execution Units what is it ?
▪ Different instructions have different numbers of steps in their cycle ▪ Differences in each step ▪ Each execution unit is optimized for one general type of instruction ▪ Multiple execution units permit simultaneous execution of several instructions
; tab ,
talk about Superscalar Processing?
▪ Process more than one instruction per clock cycle
▪ Separate fetch and execute cycles as much as possible
▪ Buffers for fetch and decode phases
▪ Parallel execution units
; tab ,
talk about Superscalar Issues
▪ Out-of-order processing – dependencies (hazards)
▪ Data dependencies
▪ Branch (flow) dependencies and speculative execution
▪ Parallel speculative execution or branch prediction
▪ Branch History Table
▪ Register access conflicts
▪ Rename or logical registers
; tab ,
why need Memory Enhancementsand what are they ?
Memory is slow compared to CPU processing speeds!
▪ 2Ghz CPU = 1 cycle in ½ of a billionth of a second
▪ 70ns DRAM = 1 access in 70 millionth of a second
▪ Methods to improvement memory accesses:
▪ Wide Path Memory Access
• Retrieve multiple bytes instead of 1 byte at a time
▪ Memory Interleaving
• Partition memory into subsections, each with its own address register and data register
▪ Cache Memory
; tab ,
Cache Memory
▪ Blocks: 8 or 16 bytes ▪ Tags: pointer to location in main memory ▪ Cache controller ▪ hardware that checks tags ▪ Cache Line ▪ Unit of transfer between storage and cache memory ▪ Hit Ratio: ratio of hits out of total requests ▪ Synchronizing cache and memory ▪ Write through ▪ Write back
; tab ,
Step-by-Step Use of Cache1
; tab ,
Performance Advantages of cache memory?
▪ Hit ratios of 90% common ▪ 50%+ improved execution speed ▪ Locality of reference is why caching works ▪ Most memory references confined to small region of memory at any given time ▪ Well-written program in small loop, procedure or function ▪ Data likely in array ▪ Variables stored togeth
; tab ,
Why do the sizes of the caches have to be different?
; tab ,
reasons for Multiprocessing?
▪ Reasons ▪ Increase the processing power of a system ▪ Parallel processing
; tab ,
Multiprocessor system in Multiprocessing is ?
▪ Tightly coupled ▪ Multicore processors - when CPUs are on a single integrated circuit
; tab ,
what is Multiprocessor Systems for ?
▪ Identical access to programs, data, shared memory, I/O, etc. ▪ Easily extends multi-tasking, and redundant program executio
; tab ,
▪ Two ways to configure Multiprocessor Systems
▪ Two ways to configure ▪ Master-slave multiprocessing ▪ Symmetrical multiprocessing (SMP)
; tab ,
Master-Slave Multiprocessing, Master CPU?
▪ Manages the system ▪ Controls all resources and scheduling ▪ Assigns tasks to slave CPUs