8 - GPU Acceleration and AI HW Flashcards

1
Q

WHy is AI computationally expensive?

A
  • Massive models with billions of parameters
  • Complex operations: Training involves matrix mult., grad calc., backprop
  • Iterative Training: Repeatedly updating parameters using grad descent
  • Vast Datasets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

FLOPs

A

Floating Point Operations Per Second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

TOPs

A

Tera Operations Per Second. FOR INTGER-POINT OPERATIONS (compared to floating point)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why aren’t CPUs good for AI?

A

Their serial nature limits their ability to handle parallel AI workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

CUDA

A

Nvidia’s programming framework for GPU tasks beyond graphics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Tensor Cores

A

Specialised hardware units to handle matrix multiplication.

Each one operates on a 4x4 matrix and performs D = A x B + C (each is a 4x4 matrix)

Perform all operations in one clock rather than sequentially

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

CPUs vs GPUs in core numbers and serial vs parallel

A

CPUs are masters of serial tasks with lower core numbers that are more powerful

GPUs on the other hand have hundreds of cores and excel at parallel tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

CUDA Cores

A

Traditional GPU cores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Google TPU make up

A

Matrix Multiply Units (like tensor cores)
Unified Buffer ( High speed memory for MXU)
Weight FIFO: Stores neural net weights for MXU
Scalar and Vector Units: Handle additional arithmetic and control flow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Systolic Arrays

A

Grid of MACs: The MXU is a large grid of Multiply-Accumulate Units that do A*B+C

Input data (A) flows horizontally through the array and weights (B) flow vertically. Partial sums accumulate vertically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Types of TPUs

A

Cloud TPUs
Edge TPUs: small power efficient for AI in devices
TPU pods: massively interconnected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ASICs

A

Application Specific Integrated Circuits

Custom circuits for specific AI tasks. Inflexible and costly but efficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

FPGAs

A

FIeld Programmable Gate Arrays

Reconfigurable hardware more versatile than ASICs but lower performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

FPGA Architecture

A
  • Configurable Logic Blocks: consists of lookup table, flip flops and multiplexers
  • Routing Resources: Network for programmable switches/wires
  • Input/Output Blocks: Communication
  • Block RAM: Dedicated memory blocks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In memory Computing

A

Integrates computational capabilities directly within memory units
Seeks to minimise or eliminate the need for data movement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Neuromorphic computing

A

Aims to emulate the structure and function of the brain.

Spike timing also related

17
Q

Photonic Tensor Cores

A

Specialised hardware units using light instead of electrons to perform calculations.

Input modulators
Photonic Waveguides: confined paths
Photonic Computing Elements
Photonic Memory
Output photodetectors (back into elerctrical signals)