GPUs Flashcards

1
Q

What are different properties for CPU and GPU programs?

A

CPU: Require low latency

GPU: Require high throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are a property of graphic processing stages?

A

The stages are dependent, but data elements within are independent - perfect for parallelism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What model does GPUs use?

A

SIMT - Single Instruction Multiple Threads

Each thread compute on different data elements

Threads within a warp must be executed in lockstep(same instruction at the same time)

Enables parallell execution, without control overhead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Compare MIMD, SIMD, SIMT

A

MIMD: independent threads, multicore, good handling of control flow heavy programs, thread finish as soon as possible. Only indirectly DLP

SIMD: One thread, multiple data, can’t scale to a huge amount of data, increased DLP, decent latency, control and vector overhead

SIMT: Multiple threads in lockstep - all executing on the same point in the program. Massive throughput and parallelism, can scale to very large data amount, control flow is difficult, high latency

GPUS: Does not support arbitrary large vectors, must map to warps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the GPU architecture (NVIDIA)?

A

Graphic Processing Clusters (GPC)
->
Texture Processing Clusters (TPC)
->
Streaming Multiprocessors (SMs)
->
cores divided into processing blocks
->
1 thread per core

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Streaming Multiprocessor?

A

Coordinates execution of warps across its processing blocks. Each block has a shared register file, execution units and several potential warps.

L1 and shared memory are shared across cores within SM

Each processing block have multiple threads to choose from to execute, and up to 32 at the time. The warp scheduler choose between the available threads in an round robin fashion, skipping inactive/blocked threads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the flow in an SM

A

L1 instruction cache -> Warp scheduler (-> choose threads) -> Register file -> operands -> FU

Operand can also access constant cache, L1 shared cache and shared memory. When accessing shared memory, the data will be available for all threads sharing this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do GPUs do on a stall?

A

Switch which warp to execute - limits control overhead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a warp?

A

A grouping of threads, all executing the same instruction, but with different data elements. All threads within a warp must be in the same stage.

All threads in a GPU are executed within a warp.

A warp is scheduled by the warp scheduler.

Each processing block has several warp schedulers, each with several warps to choose from.

A SM has multiple of the processing blocks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens if one thread in a warp is stalled?

A

The entire warp is stalled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is divergence?

A

When a warp executes a control statement, as each thread operate on different data, these statements may evaluate differently across threads.

The processing block must execute all paths that are taken by at least one thread in a warp. The different paths cannot be executed in parallell, meaning we are wasting execution time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a predicate operation?

A

Control statements are converted to predicate operations where one value is selected based on the predicate value.

Instead of using predication to execute both sides of a branch, use it to directly evaluate a control statement to select a specific value based on the evaluation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are branches supported in the GPU today?

A

Supported as part of programming model.

GPU can still not execute divergent branching simultaneously. Use execution mask to decide what threads are executing which branch path.

Can skip paths that are completely dead (mask=0x0)

Have dedicated HW structures that keeps track of branches - cheaper than coding in a predication system.

Programmers can reorder edge-case conditions to outside main execution block -> exploit greater parallelism during normal case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is self predicate?

A

Implemented in CUDA

Allows single-cycle execution of what previously would be a control statement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the memory requirements of GPUs?

A

Great bandwidth

Does not need low latency, because we can schedule other warps.

Stack together memory requests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What memory does a normal GPU have?

A

Shared memory: Scratch pad style (directly accessible by the programmer)

L1 cache per SM

Constant or scalar cache

Shared L2 cache

Memory - Needs high bandwidth

Might have: Vector/scalar cache, infinity cache (AMD)

15
Q

What is GDDR?

A

Used in GPU memory.

Offers higher latency but higher bandwidth.

16
Q

What is the difference in integrated vs. discrete GPUs?

A

Discrete can use GDDR, but integrated share resources with rest of system.

Discrete are connected through PCIe slots - incredible high speed

17
Q

What are some unique features of GPU memory?

A

Memory coalescing
Shared memory

18
Q

What is shared memory in GPUs?

A

A scratchpad memory that is directly programmer accessible.

Instead of having to use memory addresses, just tell directly where the memory is. don’t need translation.

18
Q

What is memory coalescing?

A

Group together memory requests to the same cache line into one request

19
Q

How have GPUs developed in newer times?

A

Increased cache size

More heterogeneous, accelerating specific tasks due to specific demands

20
Q

What are GPU kernels?

A

Programs that can execute on the GPU

21
Q

How does GPU programming work?

A

Use compilers that can create kernels

Program with GPU directives and primitives that will compile into a kernel

Rest of program will be compiled to a CPU program that will invoke the right drivers that will run the kernel

22
Q

What is SYCL?

A

Open standard for writing efficient code for heterogeneous applications (using GPUs/FPGAs)

A unified language in which you can target different APIs for different accelerators.

Prevent need to separate CPU and GPU development tracks.

23
Q

What is Vulkan?

A

Offers more direct access to the GPU function calls.
Requires more direct control of the core parts of graphic processing, but enables alot for expert users.

24
Q

Why is GPUs important for AI?

A

AI training is inherently parallel.
For deep neural networks, GPUs allow efficient training.

GPU evolution brought AI evolution.

25
Q

What is TensorFlow?

A

Open-source library for machine learning, support for GPU

ease-of-use

26
Q

Was GPUs designed for AI?

A

No, originally designed for graphics and retro-fitted for AI.
This is apparent when looking at features of GPUs that provide little benefit to AI.

27
Q

What is the clock speed of GPUs compared to CPUs?

A

Much lower for GPUs