Lectures 1 and 2 - Introduction and Fundamentals Flashcards

0
Q

Implementation or Microarchitecture

A

Logical organisation of the inner structure of the computer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Instruction Set Architecture

A

Defines the functional behaviour of the processor and hardware/software interface. It enables compatible implementations to be made that make different trade offs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Realisation

A

Physical structure embodying the implementation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Four major players in computer architecture arena

A

Applications, architectures, technology, markets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Design goals and constraints are imposed by the ….

A

Target market, available technology and target applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Important metrics

A
Energy (Joules)
Power (W)
Performance
Cost/Performance
Power Efficiency
Reliability (Mean Time to Failure)
Flexibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How were historic performance gains achieved?

A

Technology Scaling

Gates per clock

Instructions Per Cycle Increase

Instruction Count Decrease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Technology Scaling Performance Improvement

A

Provides 1.4x transistor performance improvement per generation.

7 historic process generations provided a 10.5x performance improvement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Gates per clock performance improvement

A

Reduction from 100 to 10 gate delays so 10x performance improvement. 4x of this came from pipelining with an increase from 5 to 20 stages. 2.5x from circuit level advances e.g. new logic families.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

IPC and instruction count performance gain

A

~5-8x improvement in SPECint/MHz

Advances in compiler technology and impact of increased bus widths.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

VLSI technology scaling

A

Fabrication processes are characterised by feature sizes so improvements allow transistor and wire dimensions to be reduced. A linear reduction in transistor sizes enables quadratic increase in transistor count. Smaller transistors are also faster as resistance is independent and capacitance decreases with feature size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Porting a design to a new technology

A

Root 2 performance increased

Area reduced by a factor of 2

Dynamic power consumption reduced by half as capacitance scales by S and P=CV^2*f (as f increases by 1.4 to increase performance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interconnect versus transistor scaling

A

Smaller transistor are faster/lower power but wires don’t scale in the same way. Resistance increases and capacitance per micron remains the same. Adding fat wires on upper levels can help mitigate this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Architectural Implications of poor wire scale

A

Can reach less state in a single clock cycle so decentralised structures work better. A bypass network between functional units may be preferable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Amdahl’s Law

A

Performance improvements are limited by the fraction of time the proposed enhancement can be employed.

Speed up = 1/((1-fraction_enhanced)+ (fraction_enhanced/speedup_enhanced))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Law of diminishing returns

A

Incremental improvements in speedup gained by enhancing just a portion of the computation diminish as improvements are added.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Are all enhancements worthwhile?

A

No, they consume design and implementation resources potentially slowing the common case. This may be indirectly by meaning less time is spent optimising the common case or directly by causing the cycle time to be extended.

17
Q

Typical program behaviour

A

Locality of reference - exploited by memory hierarchy

Predictable flow control - branch prediction exploits

Predictable data values - exploited at higher levels e.g. Memoization

18
Q

Principle of locality

A

Programs tend to resume data and instructions that they have used recently.

19
Q

Temporal locality

A

Recently accessed data are likely to be accessed again in the near future.

20
Q

Spatial locality

A

Accesses to nearby memory locations often occur close together in time.

21
Q

Locality in instruction reference stream

A

Temporal - loops and function calls

Spatial - instructions executed sequentially in absence of branch instructions. Many branches are to nearby instructions

22
Q

Locality in data reference stream

A

Temporal - Widely used program variables & the call stack.

Spatial - Access arrays sequentially, process streams, function calls and the stack frame

23
Q

CPU performance equation

A

1/performance = time/program = instructions/program x cycles/instruction x time/cycle

24
Q

Improving performance

A

Shorten clock cycle time - circuit design style/Microarchitecture

CPI - Microarchitecture and ISA

Instruction count - ISA and compiler technology

25
Q

Dynamic power

A

1/2 x Capacitative Load x voltage^2 x frequency switched

26
Q

Static power

A

Current-static x voltage

27
Q

Reducing power consumption

A

Reduce performance to save energy by scaling voltage/frequency.

Reduce wastage by reducing superfluous switching with clock gating and operant isolation, by extracting and optimising the common-car and by power gating.

28
Q

Components of the ISA

A

Word size, operations, registers, operant types, addressing modes, instruction encodings, memory architecture, trap and interrupt architecture

29
Q

Changing ISAs in embedded processors

A

ISA has significant impact as area and cost constraints mean transistor budget and design budget are limit. Recompilation is not a big hurdle as many embedded devices may run one set of binaries for their life, they have short lifetimes and we want to squeeze as much as possible out of the compiler.

30
Q

How can we break link between applications and ISA?

A

JIT VM technologies such as .NET and the JVM

Alternatively support conventional source ISA and translate at run time to target ISA.

31
Q

Problems with microcode controlling

A

Efficiently controlling a highly concurrent data path using microcode is complex.

Handling exceptions in a pipelined CISC machine is complicated

Microcode engine represents an unnecessary overhead especially when executing simple instructions

32
Q

RISC Design Goals

A

Choose common instructions and addressing modes

Target efficient pipelined implementation by ensuring instructions go through the same pipeline stages and executing them in a single cycle if possible.

Assume use of high-level languages - the compiler optimises register usage and instruction schedule.

Produce simple high-frequency implementation.

33
Q

Load-store architecture

A

Memory can only be accessed with load and store instructions. This permits simple fixed length instructions simplifying decoding.

It also simplifies pipelining as instructions take a similar time to execute and memory is accessed at most once in one pipeline stage.

34
Q

Register calling conventions

A

Simple convention specifies temporary and saved registers. This gives both the caller and callee a chance to reduce unnecessary register spilling. Temporary registers aren’t preserved by callee while callee saves saved registers.

35
Q

Register Windows

A

Register windows are used by some processors which improve performance of procedure call/return sequence by avoiding the need to explicitly spill registers to memory.

36
Q

Addressing modes

A

Register

Immediate

Register Indirect with displacement

Register indirect

37
Q

Condition registers and flags

A

Options are condition codes, condition register and compare and branch.

Trade-offs:

Impact on local code scheduling optimisations

Use of general purpose registers

Number of instructions required to implement conditional branch

Cycle time implications

38
Q

Encoding an instruction set

A

Balance:

Number of registers supported

Number of addressing modes supported

Against:

Size of instructions and compiled program

Fetch and decoding logic complexity and pipeline complexity in general

Options are:

Variable length

Fixed length

Hybrid format

39
Q

16-bit instruction set extensions

A

Address a subset of operations, addressing modes and registers but allow for static code size reductions of 25-40%.

They also have a 10-20% performance penalty.