Performance Flashcards

1
Q

What is Parallelism?

A

Two or more processes
that execute simultaneously and
independently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Concurrency?

A

Two or more processes that execute simultaneously and share at least one resource

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Processor?

A

The electronic circuitry that executes instructions comprising a program. There can be many processors in a computer system e.g. graphics processor, video processor. When left unqualified, usually refers to CPU.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a CPU?

A

This is (one of) the main general-purpose processor(s) in a computer system, not a specific purpose e.g. video decompression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Die?

A

Refers to the silicon wafer containing the processor (usually the CPU when no other context is given) along with other components required for interfacing e.g. memory controller.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Socket?

A

Refers to the component containing the processor die, and includes the physical connectors to plug in to the motherboard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Core?

A

(HARDWARE) A SMALL CPU OR PROCESSOR BUILT INTO A BIG CPU OR CPU SOCKET. IT CAN INDEPENDENTLY PERFORM OR PROCESS ALL COMPUTATIONAL TASKS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Thread?

A

(SOFTWARE) A SINGLE SEQUENTIAL FLOW OF CONTROL IN A PROGRAM.
IT IS THE SMALLEST UNIT THAT CAN BE MANAGED BY AN OS SCHEDULER.
EACH THREAD HAS ITS OWN PROGRAM COUNTER, REGISTERS AND STACK.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Node?

A

• Normally thought of as one “computer”
• A single motherboard
• May have:
• More than one CPU
• Each CPU will have many cores
• One or more accelerators (GPU etc.)
We use OpenMP to program within the node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Cluster/Supercomputer?

A

A collection of 100s/1000s of nodes connected through a high-speed interconnect (this is what makes it different to a server).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a single precision floating-point?

A

Formally called the IEEE 754 single-precision binary floating-point format: binary32. This is a format for representing floating-point numbers in computers using a total of 32 bits.
Corresponds to the C datatype float. Also called float32/FP32.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a double precision floating-point?

A

Also part of the same IEEE 754 specification as double-precision binary
floating-point format: binary64.
This represents floating-point numbers using 64 bits. Corresponds to the C datatype double. Also called float64/FP64.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a Flop?

A

Abbreviation for floating-point operation.
Usually means either an addition or multiplication of two floating-point numbers but other operations could be included.
A common unit of measurement of processor speed is flops/s.
Usually has to be qualified with either single or double precision to be specific.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the RPeak for a single node?

A

𝑅𝑝𝑒𝑎𝑘 = 2 ∗ 𝑊𝑣𝑒𝑐 ∗ 𝑟𝑐𝑙𝑜𝑐𝑘 ∗ 𝑛𝑐𝑜𝑟𝑒 ∗ 𝑛𝑠𝑜𝑐𝑘𝑒𝑡

Where

𝑤𝑣𝑒𝑐: Vector width
𝑟𝑐𝑙𝑜𝑐𝑘: Clock speed
𝑛𝑠𝑜𝑐𝑘𝑒𝑡 : Sockets per node
𝑛 𝑐𝑜𝑟𝑒: Cores per socket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to calculate observed runtime?

A

𝑟𝑜 = 𝑟𝑡 + 𝜖

Where

• 𝑟𝑜 – Observed runtime (The physical time it took to run)
• 𝑟𝑡 – True runtime (The time actually running)
• 𝜖 – Noise (Hinderances to your program running)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Difference between observed and true runtime?

A

Your observed runtime will never be lower than the true runtime.
Therefore, reporting the minimum observed runtime will be closest to the true runtime.

17
Q

What is Arithmetic Intensity?

A

I(n) =W(n) / Q(n)

Where 𝑊 is the number of flops carried out by the program, and
𝑄 is the number of bytes transferred from memory to cache.
• Flops/byte
• Programs with low AI are called memory-bound programs
• Programs with high AI are called compute-bound programs
• For memory-bound programs, the processor spends more time waiting for the operands to be delivered from the memory.

18
Q

Hardware size in order (smallest to largest)

A

Core < Die < Socket < Node < Cluster