Single Processor Computing Flashcards by Maxwell Lam

Von Neumann Architecture

A stream of instructions executed in an order

How well did you know this?

Not at all

Perfectly

Control Flow

A prescribed sequence of instructions

How well did you know this?

Not at all

Perfectly

Stored Program

Instructions and program data stored in the same memory

How well did you know this?

Not at all

Perfectly

Fetch, Execute, Store

Load next instruction onto processor,
Operation is executed
Writes content back to memory

How well did you know this?

Not at all

Perfectly

Direct To Memory Architecture

Allows data to be directly sent to and from memory to processors.

Work stays in the memory

How well did you know this?

Not at all

Perfectly

Out-of-order instruction handling

Instructions can be processed in a different order.

How well did you know this?

Not at all

Perfectly

Why do modern processors use out-of-order instruction handling?

Increased performance with pipelining.

How well did you know this?

Not at all

Perfectly

Pipelining

Allowing the CPU to work on multiple smaller instructions at the same time.

How well did you know this?

Not at all

Perfectly

Execution time for pipelined instructions?

t(n) = (n + n[1/2])x

x = basic operation time
n = Linear operation time
n[1/2] = Sublinear operation time.

How well did you know this?

Not at all

Perfectly

Instruction level parallelism (ILP)

Finding independent instructions and running them in parallel.

How well did you know this?

Not at all

Perfectly

Speculative Execution

Assumes the test will turn out true or consistent.

How well did you know this?

Not at all

Perfectly

Prefetching

Data can be speculatively requested before any instruction needing it is actually encountered.

How well did you know this?

Not at all

Perfectly

Branch Prediction

Guessing whether a conditional instruction will evaluate to true and executing accordingly.

How well did you know this?

Not at all

Perfectly

von Neumann bottleneck

Transferring data from memory takes more time than actually processing the data.

How well did you know this?

Not at all

Perfectly

How does the memory hierarchy address the von Neumann bottleneck?

By reducing the number of times the CPU has to wait for data to be transferred

How well did you know this?

Not at all

Perfectly

Latency

The delay between the processor issuing a request from a memory item and the item arriving.

How well did you know this?

Not at all

Perfectly

Bandwidth

The rate at which data arrives at its destination after the initial latency is overcome.

How well did you know this?

Not at all

Perfectly

What role do registers play in the memory hierarchy?

Registers are the fastest and allow the CPU to access data instantly without going to slower caches or memory.

How well did you know this?

Not at all

Perfectly

What role do cache systems play in the memory hierarchy?

Cache stores frequently accessed data that the CPU might need without having the CPU go to the main memory.

How well did you know this?

Not at all

Perfectly

Cache Tags

Info keeping track of the location and status of data in the cache.

How well did you know this?

Not at all

Perfectly

What are the 3 types of cache misses in single-processor systems?

Compulsory: Caused when data is first accessed and is not present in the cache.

Capacity: Caused by data having been overwritten because the cache can not contain all your problem data.

Conflict: Caused by one data item being mapped to the same cache location as another.

How well did you know this?

Not at all

Perfectly

What additional type of cache miss occurs in multi-core systems?

Coherence: multiple processor cores have copies of the same data in their cache, but those copies become inconsistent due to changes.

How well did you know this?

Not at all

Perfectly

What two properties of typical program memory access patterns are exploited by cache systems?

Study These Flashcards

Temporal Locality: When a memory location is accessed, it is more likely to be accessed again.

Spatial locality: When a memory location is accessed, it is likely that nearby memory locations will also be accessed.

LRU Replacement Policy

Study These Flashcards

Least Recently Used

FIFO Replacement Policy

First In First Out

Cache Line

The smallest unit of data on a cache.

What property of program memory accesses do cache lines exploit?

Spatial locality.

What effect does stride have on cache line efficiency?

If the cache line is large, then having a large stride means we'll only access a small part of it. If the cache line is very small, then striding wouldn't necessarily cause a problem.

Cache Mapping

Deciding how data is stored on the cache.

Direct Mapped Cache

Each memory block is mapped to only one specific cache location.

Fully associative cache

Each memory block can be mapped to any cache location.

K-way set associative cache

The cache is divided into a number of sets, each containing k cache lines.

Cache Memory

Small, fast memory used to store frequently accessed data

Dynamic Memory

Data that is currently being executed.

Stall

Delay in the execution of instructions caused by a dependency between instructions.

Cache Miss

When the data is not available in the cache and has to be fetch from main memory.

Prefetch data stream

When the processor tries to predict what data will be accessed and fetch that data to the cache.

What is a hardware prefetch and how is it different from a prefetch intrinsic?

Intrinsic = Controlled by software Hardware = Controlled by hardware

What is Little’s Law? What does it tell us about the effect of latency on computing systems?

Concurrency = Bandwidth X Latency. Increasing latency will increase the build-up of requests and reduce performance.

Why are memory banks used in memory systems?

To increase memory bandwidth. If multiple requests target the same bank (bank conflict), the requests must be handled in serial, which reduces memory bandwidth.

What is a bank conflict?

When two consecutive operations access the same memory bank at the same time.

What is the TLB? What function does it have in the memory hierarchy?

Translation Look-aside Buffer: a cache of frequently used page table entries, providing fast address translation of pages. If a program needs a memory location, the processor first checks the TLB, else it looks up the page in the main memory.

What property would a program have that would cause performance degradation due to TLB misses?

Poor spatial locality

How big is a typical TLB?

Between 64 and 512 entries.

What are cache-aware and cache-oblivious algorithms?

Aware: Designed to take advantage of specific cache sizes. Oblivious: Designed to perform well on a wide range of sizes without requiring any knowledge of cache parameters.

What is loop tiling and what is it used for?

Breaking up loops into smaller, nested loops to increase performance

What is the motivation for multi-core architectures?

Separate cores can work on unrelated tasks, or introduce data parallelism.

Define Core, Socket, and Node

Core: General Processing Unit of Distributed memory Socket: Connects a CPU to the motherboard Node: Contains multiple sockets accessing the same shared memory

Cache Coherence

Ensuring all cached data are an accurate copy of the main memory.

False Sharing

When multiple threads access different variables located in the same cache line.

NUMA

Non-uniform memory access: For a processor running on some core, the memory attached to its socket is faster to access than the memory attached to another socket.

First touch phenomenon

Dynamically allocated memory is not allocated until it’s first written to.

What is arithmetic intensity and how do we compute it?

Number of operations per memory access. f(n) / n where f(n) is the number of operations it takes, and n is the number of data items that an algorithm operates on.

What is the roofline model? What defines the roofline?

Visually expresses the arithmetic intensity and performance limits. Performance limit defines the roofline.

How can we use the roofline model to optimize code performance?

To see if the code is limited by hardware or arithmetic throughput.

Single Processor Computing Flashcards

(55 cards)