Lectures 1 and 2 - Introduction and Fundamentals Flashcards
Implementation or Microarchitecture
Logical organisation of the inner structure of the computer.
Instruction Set Architecture
Defines the functional behaviour of the processor and hardware/software interface. It enables compatible implementations to be made that make different trade offs.
Realisation
Physical structure embodying the implementation.
Four major players in computer architecture arena
Applications, architectures, technology, markets
Design goals and constraints are imposed by the ….
Target market, available technology and target applications.
Important metrics
Energy (Joules) Power (W) Performance Cost/Performance Power Efficiency Reliability (Mean Time to Failure) Flexibility
How were historic performance gains achieved?
Technology Scaling
Gates per clock
Instructions Per Cycle Increase
Instruction Count Decrease
Technology Scaling Performance Improvement
Provides 1.4x transistor performance improvement per generation.
7 historic process generations provided a 10.5x performance improvement.
Gates per clock performance improvement
Reduction from 100 to 10 gate delays so 10x performance improvement. 4x of this came from pipelining with an increase from 5 to 20 stages. 2.5x from circuit level advances e.g. new logic families.
IPC and instruction count performance gain
~5-8x improvement in SPECint/MHz
Advances in compiler technology and impact of increased bus widths.
VLSI technology scaling
Fabrication processes are characterised by feature sizes so improvements allow transistor and wire dimensions to be reduced. A linear reduction in transistor sizes enables quadratic increase in transistor count. Smaller transistors are also faster as resistance is independent and capacitance decreases with feature size.
Porting a design to a new technology
Root 2 performance increased
Area reduced by a factor of 2
Dynamic power consumption reduced by half as capacitance scales by S and P=CV^2*f (as f increases by 1.4 to increase performance).
Interconnect versus transistor scaling
Smaller transistor are faster/lower power but wires don’t scale in the same way. Resistance increases and capacitance per micron remains the same. Adding fat wires on upper levels can help mitigate this.
Architectural Implications of poor wire scale
Can reach less state in a single clock cycle so decentralised structures work better. A bypass network between functional units may be preferable.
Amdahl’s Law
Performance improvements are limited by the fraction of time the proposed enhancement can be employed.
Speed up = 1/((1-fraction_enhanced)+ (fraction_enhanced/speedup_enhanced))
Law of diminishing returns
Incremental improvements in speedup gained by enhancing just a portion of the computation diminish as improvements are added.
Are all enhancements worthwhile?
No, they consume design and implementation resources potentially slowing the common case. This may be indirectly by meaning less time is spent optimising the common case or directly by causing the cycle time to be extended.
Typical program behaviour
Locality of reference - exploited by memory hierarchy
Predictable flow control - branch prediction exploits
Predictable data values - exploited at higher levels e.g. Memoization
Principle of locality
Programs tend to resume data and instructions that they have used recently.
Temporal locality
Recently accessed data are likely to be accessed again in the near future.
Spatial locality
Accesses to nearby memory locations often occur close together in time.
Locality in instruction reference stream
Temporal - loops and function calls
Spatial - instructions executed sequentially in absence of branch instructions. Many branches are to nearby instructions
Locality in data reference stream
Temporal - Widely used program variables & the call stack.
Spatial - Access arrays sequentially, process streams, function calls and the stack frame
CPU performance equation
1/performance = time/program = instructions/program x cycles/instruction x time/cycle