Lecture 3 - Processor and memory Flashcards
What is processor architecture?
What components the processor consists of, and how they are connected
What are the two types of processor architectures?
The two types seperates how memory is organized.
Von Neumann: Shared instruction- and data memory (general purpose).
Harvard: Seperate memories for instruction and data (mostly used in embedded systems)
Why is Harvard architectures best for embedded systems?
To optimize memory use.
You know the text segment at run time. Won’t be new instructions, and can therefor seperate data and text.
What are the main processor functions?
Fetch: Get instruction from memory (send PC to memory and read from memory at this address)
Decode: What operation need to be performed, and what are the operands. Locates the operands
Executes: Read operands and execute instructions. Save results
What is a single cycle processor design?
All functions (fetch, decode, execute) are all performed in one cycle.
Cycle needs to be long enough to cover all the functions.
What is a downside with using single cycle designs?
Only one piece of hardware is being used at the time. During fetch, decode and execution is idle. During decode, fetch and execution is idle
What is pipelined processor design?
Breaks instruction execution in different phases.
Execute one phase in one cycle. Because of this, less work is done per cycle - can increasy frequency
Can overlap execution of phases, as these are executed on different pieces of hardware.
What stages updates the PC in a pipeline design?
Fetch must update the PC, so that the next instruction is ready to be fetched next cycle.
For branch instructions, execute stages sets the PC - don’t know earlier because of branch conditions needs to be resolved.
What is the formula for execution time?
Instruction count * CPI * cycle time
What are the types of dependencies?
Data- and control dependencies
Compare execution time of single- and pipeline processors
Instruction count is the same
CPI is the same (Because of overlap, though one instruction takes n cycles, the overlap causes one instruction to finish every cycle)
Pipeline has 1/n cycle time of single cycle
Pipeline can provide n-times the performance (the ideal case)
Why is 1/n only the ideal performance increase of pipeline, and not the actual one
To be able to achieve 1/n cycle time in pipelined processors, the work for each phase must be evenly distributed. This is difficult to do.
Even if the cycle time is 1/n, the CPI being 1 is difficult because of dependencies causing hazards
What are data dependencies?
An instruction reads the result of a previous instruction before the result is ready to be used
What is a control dependency?
Instruction execution depends on the outcome of branch instructions
What is one way to avoid hazards?
Pause pipeline - increases execution time
What are true (data) dependencies?
Data written by one instruction is being used by another
What are named dependencies / false dependencies
No data movement between instructions, but instructions are using the same registers.
What are anti dependencies
A type of named dependency
One instruction reads a register. A later instruction writes to this register. These instructions can’t be reorderes as the first would read a different value if it were to execute after the second.
What is output dependencies?
Type of named/false dependencies.
Output dependencies uses the same output registers.
For example when two instruction write to the same data. If other instructions depends on the stored value of this data from one of the instructions, re-ordering the write instructions will cause wrongful execution.
The instructions themselves are not effected by each other, but other instructions are
What is branch penalty
When pipeline needs to be flushed because of wrong execution path.
Loss of cycles
What can be done to avoid control hazards?
While a branch instruction is passing through the pipeline, start execution of instructions that are completely independent.
Disadvantages:
Difficult in general to find these instructions.
This is more difficult for deeper pipelines.
Exposes pipeline design to programmer/compiler -> architecture dependent code
Can also use a branch predictor
What is a branch predictor?
Hardware that predicts the direction taken by the branch, before the next fetch is executed.
What can be done to boost performance
Out-of-order execution
Superscalar processors (fetch, decode and execute multiple instructions per cycle)
Multithreaded processors (execute multiple instructions streams in parallell)
Caching, prefetch
What are the different memory technologies that are available?
speed and cost for 1GB
SRAM: (1-10 ns), expensive ($1000)
DRAM: (100 ns), $10
Flash SSD: (100 micro meter), $1
Magnetic disk: (10 ms), $0.1
What are the trade offs within memory?
Speed and cost
What is SRAM and how does it store a bit
Static Random Acces Memory
Use two cross-coupled inverters. The output of one inverter is connected to the input of the other. Out from second connected to input of first. This connection holds the bit.
A bit is sent to first inverter - inverted, then sent to the second inverter - inverted back
Needs 6 transistors, 2 per inverter, 2 to access the cell.
The cell is used to read/write the bit
What is DRAM
Dynamic Random Access Memory.
One capacitor stores a single bit. The charge of the capacitor indicates the value of the bit. Charged - 1, discharged - 0.
1 access transistor
What is a problem with DRAM
Leaks charge over time.
A value stored in a DRAM cell can disappear over time. Charged value can be flipped over time.
How is leaks prevented in DRAM?
Refresh DRAM periodically.
Check what is stored in capacitor. If the value is 1, we know there are some charge int he capacitor - need to put in more charge to keep it full.
How is memory fetched from a 2D DRAM memory?
Address is stored in address register
The most significant bits in the address represent the row, the LSB represent the column.
Row address is decoded by row-decoder. Based on address bits - one row is activated - all cells in row is activated/read from capasitor.
The values from the row is driven from the cells. Data read from row is amplified because some capacitors may be lowering down.
Column decoder chooses the bits of interest.
Compare DRAM and SRAM
DRAM is slower because it uses capacitors
DRAM provides higher desity - only need 1 capacitor and one transistor
DRAM has lower cost
DRAM requires refresh because of leaks, this refresh needs power and area. During refresh processor cannot use it - reduces performance
What is the memory hierarchy?
Combines memory kinds. Small amounts of fast memory, close to the processor.
Large amount of slow memory, farther from the processor
This is to appear the existense of a large, fast memory.
Registers (flip flops): 100s of bytes
Cache (SRAM): 1kB-10MB, can be multiple levels
Main memory (DRAM): 1-64GB
Disk/SSD: 100s GB (SSD), 1-2 TB (Disk)
What is the granilarity of which memory is transfered between memory levels?
Registers - cache: words, 4-8 bytes
Cache - main memory: blocks, 16-128 bytes
Main memory - disk: pages (1KB - 2MB)
When transferring data between memory layers, why are more data transfered the further down the hierarchy we go
To compensate for the initial memory latency.
Fetching the first byte of data is expencive - but the subsequent once comes fast.
Why is memory hiararchy effective?
Because of temporal and spatial locality
What is temporal locality?
A recently accessed memory location is likely to be accessed again in the near future
What is spatial locality?
Memory locations close to a recently accessed loction are likely to be accessed in the near future.
What is a cache block/line?
The unit/granularity of data stored in the cache (32-128 bytes)
What is cache hit?
Data is found in the cache
What is cache miss?
Data is not found in cache
Needs to move further down the hierarchy (lower level caches, main memory)
When found - copy data to cache (because of temporal locality)
What is hit rate?
Fraction of accesses that are in the given level of memory
What is hit time?
Time required to access a memory level
What is miss penalty?
Time required to fetch block into some level, from the next level down the hierarchy
How are data identified in main memory
By their full 32-bit address
How do we map a 32-bit address to a smaller memory such as a cache?
Only use a part of the address to map to an address in cache.
A tag field is introduced in the cache.
This tag field stores the higher order bits of the address that are not used to locate a memory in the cache.
As multiple addresses will map to the same cache line, the tag is compared with the higher order bits to identify what address is currently stored in the cache.
What is the valid bit in the cache?
Indicates if memory has been moved to the cache line.
Even though a cache line might be empty, a memory address may still be pointing to this line. The valid bit is used to indicate that the cache line is not written, and that the address still needs to be fetched from memory.
What types of bits are used in a direct mapped cache?
Tag: Identifies that the address is stored in cache
index bit: used to choose cache line
Byte offset: Choose the byte from within the cache line