Computer Systems Flashcards
Why is a write-invalidate protocol is usually preferred to a write-update protocol?
- Multiple updates to the same cache block with no intervening reads require multiple update broadcasts in an update protocol, which is wasteful over the limited bus bandwidth.
- The invalidate protocol only requires a single invalidate broadcast on the first write.
- Minimizing bus traffic of paramount importance as the bus is usually the bottleneck that limits the number of processes that can be accommodated.
What is meant by a ‘load-store instruction set architecture’?
A load-store ISA is an example of a general-purpose register architecture where the operands if the arithmetic and logic instructions are located in registers, not memory. The only instructions that can access memory are load/store.
How does the adoption of a load-store ISA influence the design of CPU hardware?
- Facilitates fixed length, fixed format instructions that are easy to decode
- Demands a large set of general-purpose registers to store intermediate results
- Does not admit complex instructions, so the datapath can be controlled without excessive use of microcode.
- Results in all instructions having similar complexity, which facilitates effective pipelining.
Explain why the latency of the ALU is of paramount importance in computer hardware design?
- According to Amdahl’s Law, it is of paramount importance to improve the speed of frequent operations in a system to have the greatest effect on the performance of a system (i.e. make the common case fast).
- In most ISAs, the ALU is used at least once in most instructions, hence improving the latency of the ALU will significantly improve the speed of computer systems.
What is meant by a pipelined datapath?
- Extra registers between the principle datapath stages.
- Advances through just one datapath stage in each clock cycle, storing interim results in the registers.
- In this way, several instructions can be in the pipeline at the same time.
- Pipeline therefore increase instruction throughput.
What are pipeline hazards?
Describe dependencies between running instructions that disrupt operation of the pipeline.
Data hazard
Occur when an instruction requires data before a previous instruction has updated its contents to the register file.
Branch hazard
Occur when the address to the next instruction is required before an earlier conditional branch instruction has been evaluated.
What extra hardware would be required in a data pipeline to support super-scalar operation?
- Superscalar CPUs issue > 1 instruction/clock cycle.
- Need to duplicate essential elements (for arithmetic and storage) in the pipeline.
- Instructions should be sequenced to avoid simultaneous use of resources (i.e. ALU + MEM).
- Widen registers to store double states.
- Widen instruction memory to process 64 bits per cycle.
- Read 4 input registers and write 2 at a time
- Extra hazard detection logic.
What extra hardware would be required in a data pipeline to support SMT operation?
- Extra hardware to deal with each process’s independent state.
- Separate registers, PC, and TLB per thread.
- Since instruction can be shared, can expect fewer stalls and higher throughput.
- More cache conflicts and misses due to simultaneous updates.
Explain why inclusion of a cache, between the CPU and MM, generally improves a computer’s performance?
- Faster because of physical proximity to CPU
- Construction out of static RAM instead of slower dynamic RAM
- Data still needs to be fetched from MM to the cache, but this overhead is easily amortized through temporal and spatial locality of reference, so main memory latency price paid just once.
What are the relative advantages of direct-mapped and set-associative caches?
- DM caches have an index pointing to a unique block (i.e. no searching): low hit time
- Might replace blocks to be referenced soon as there is no choice: high miss rate
- SA points to set of blocks to search over: increased hit time
- Flexibility of blocks to replace (i.e. LRU): low miss rate
What requirements of a modern computer system motivate the adoption of a virtual memory system?
- Should be able to write programs without having to worry about the amount of physical memory in the computer.
- CPU should be able to execute multiple process separately, with each process unaware of, and protected, from the others.
What is the purpose of the TLB?
The need to look up address translations in the page table undermines the presence of a cache.
The CPU contains a small, fast cache called the TLB to store recently used page entries, consisting of an index (i.e. virtual page #) that corresponds to a physical page #.
TLB Properties
- Total size: 32-4096 page table entries
- Block size: 1-2 page entries
- Hit time: < 1 clock cycle
- Miss rate: 0.01-1%