Single Processor Computing Flashcards

1
Q

Von Neumann Architecture

A

A stream of instructions executed in an order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Control Flow

A

A prescribed sequence of instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Stored Program

A

Instructions and program data stored in the same memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Fetch, Execute, Store

A

Load next instruction onto processor,
Operation is executed
Writes content back to memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Direct To Memory Architecture

A

Allows data to be directly sent to and from memory to processors.

Work stays in the memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Out-of-order instruction handling

A

Instructions can be processed in a different order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why do modern processors use out-of-order instruction handling?

A

Increased performance with pipelining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pipelining

A

Allowing the CPU to work on multiple smaller instructions at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Execution time for pipelined instructions?

A

t(n) = (n + n[1/2])x

x = basic operation time
n = Linear operation time
n[1/2] = Sublinear operation time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Instruction level parallelism (ILP)

A

Finding independent instructions and running them in parallel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Speculative Execution

A

Assumes the test will turn out true or consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Prefetching

A

Data can be speculatively requested before any instruction needing it is actually encountered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Branch Prediction

A

Guessing whether a conditional instruction will evaluate to true and executing accordingly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

von Neumann bottleneck

A

Transferring data from memory takes more time than actually processing the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does the memory hierarchy address the von Neumann bottleneck?

A

By reducing the number of times the CPU has to wait for data to be transferred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Latency

A

The delay between the processor issuing a request from a memory item and the item arriving.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Bandwidth

A

The rate at which data arrives at its destination after the initial latency is overcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What role do registers play in the memory hierarchy?

A

Registers are the fastest and allow the CPU to access data instantly without going to slower caches or memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What role do cache systems play in the memory hierarchy?

A

Cache stores frequently accessed data that the CPU might need without having the CPU go to the main memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Cache Tags

A

Info keeping track of the location and status of data in the cache.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the 3 types of cache misses in single-processor systems?

A

Compulsory: Caused when data is first accessed and is not present in the cache.

Capacity: Caused by data having been overwritten because the cache can not contain all your problem data.

Conflict: Caused by one data item being mapped to the same cache location as another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What additional type of cache miss occurs in multi-core systems?

A

Coherence: multiple processor cores have copies of the same data in their cache, but those copies become inconsistent due to changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What two properties of typical program memory access patterns are exploited by cache systems?

A

Temporal Locality: When a memory location is accessed, it is more likely to be accessed again.

Spatial locality: When a memory location is accessed, it is likely that nearby memory locations will also be accessed.

24
Q

LRU Replacement Policy

A

Least Recently Used

25
Q

FIFO Replacement Policy

A

First In First Out

26
Q

Cache Line

A

The smallest unit of data on a cache.

27
Q

What property of program memory accesses do cache lines exploit?

A

Spatial locality.

28
Q

What effect does stride have on cache line efficiency?

A

If the cache line is large, then having a large stride means we’ll only access a small part of it.

If the cache line is very small, then striding wouldn’t necessarily cause a problem.

29
Q

Cache Mapping

A

Deciding how data is stored on the cache.

30
Q

Direct Mapped Cache

A

Each memory block is mapped to only one specific cache location.

31
Q

Fully associative cache

A

Each memory block can be mapped to any cache location.

32
Q

K-way set associative cache

A

The cache is divided into a number of sets, each containing k cache lines.

33
Q

Cache Memory

A

Small, fast memory used to store frequently accessed data

34
Q

Dynamic Memory

A

Data that is currently being executed.

35
Q

Stall

A

Delay in the execution of instructions caused by a dependency between instructions.

36
Q

Cache Miss

A

When the data is not available in the cache and has to be fetch from main memory.

37
Q

Prefetch data stream

A

When the processor tries to predict what data will be accessed and fetch that data to the cache.

38
Q

What is a hardware prefetch and how is it different from a prefetch intrinsic?

A

Intrinsic = Controlled by software
Hardware = Controlled by hardware

39
Q

What is Little’s Law?
What does it tell us about the effect of latency on computing systems?

A

Concurrency = Bandwidth X Latency.

Increasing latency will increase the build-up of requests and reduce performance.

40
Q

Why are memory banks used in memory systems?

A

To increase memory bandwidth.

If multiple requests target the same bank (bank conflict), the requests must be handled in serial, which reduces memory bandwidth.

41
Q

What is a bank conflict?

A

When two consecutive operations access the same memory bank at the same time.

42
Q

What is the TLB?

What function does it have in the memory hierarchy?

A

Translation Look-aside Buffer: a cache of frequently used page table entries, providing fast address translation of pages.

If a program needs a memory location, the processor first checks the TLB, else it looks up the page in the main memory.

43
Q

What property would a program have that would cause performance degradation due to TLB misses?

A

Poor spatial locality

44
Q

How big is a typical TLB?

A

Between 64 and 512 entries.

45
Q

What are cache-aware and cache-oblivious algorithms?

A

Aware: Designed to take advantage of specific cache sizes.

Oblivious: Designed to perform well on a wide range of sizes without requiring any knowledge of cache parameters.

46
Q

What is loop tiling and what is it used for?

A

Breaking up loops into smaller, nested loops to increase performance

47
Q

What is the motivation for multi-core architectures?

A

Separate cores can work on unrelated tasks, or introduce data parallelism.

48
Q

Define Core, Socket, and Node

A

Core: General Processing Unit of Distributed memory

Socket: Connects a CPU to the motherboard

Node: Contains multiple sockets accessing the same shared memory

49
Q

Cache Coherence

A

Ensuring all cached data are an accurate copy of the main memory.

50
Q

False Sharing

A

When multiple threads access different variables located in the same cache line.

51
Q

NUMA

A

Non-uniform memory access:

For a processor running on some core, the memory attached to its socket is faster to access than the memory attached to another socket.

52
Q

First touch phenomenon

A

Dynamically allocated memory is not allocated until it’s first written to.

53
Q

What is arithmetic intensity and how do we compute it?

A

Number of operations per memory access.

f(n) / n where f(n) is the number of operations it takes, and n is the number of data items that an algorithm operates on.

54
Q

What is the roofline model?
What defines the roofline?

A

Visually expresses the arithmetic intensity and performance limits.

Performance limit defines the roofline.

55
Q

How can we use the roofline model to optimize code performance?

A

To see if the code is limited by hardware or arithmetic throughput.