8 - GPU Acceleration and AI HW Flashcards

Question 1

Q

WHy is AI computationally expensive?

Answer

A

Massive models with billions of parameters
Complex operations: Training involves matrix mult., grad calc., backprop
Iterative Training: Repeatedly updating parameters using grad descent
Vast Datasets

Question 2

Q

FLOPs

Answer

A

Floating Point Operations Per Second

Question 3

Q

TOPs

Answer

A

Tera Operations Per Second. FOR INTGER-POINT OPERATIONS (compared to floating point)

Question 4

Q

Why aren’t CPUs good for AI?

Answer

A

Their serial nature limits their ability to handle parallel AI workloads.

Question 5

Q

CUDA

Answer

A

Nvidia’s programming framework for GPU tasks beyond graphics.

Question 6

Q

Tensor Cores

Answer

A

Specialised hardware units to handle matrix multiplication.

Each one operates on a 4x4 matrix and performs D = A x B + C (each is a 4x4 matrix)

Perform all operations in one clock rather than sequentially

Question 7

Q

CPUs vs GPUs in core numbers and serial vs parallel

Answer

A

CPUs are masters of serial tasks with lower core numbers that are more powerful

GPUs on the other hand have hundreds of cores and excel at parallel tasks.

Question 8

Q

CUDA Cores

Answer

A

Traditional GPU cores

Question 9

Q

Google TPU make up

Answer

A

Matrix Multiply Units (like tensor cores)
Unified Buffer ( High speed memory for MXU)
Weight FIFO: Stores neural net weights for MXU
Scalar and Vector Units: Handle additional arithmetic and control flow

Question 10

Q

Systolic Arrays

Answer

A

Grid of MACs: The MXU is a large grid of Multiply-Accumulate Units that do A*B+C

Input data (A) flows horizontally through the array and weights (B) flow vertically. Partial sums accumulate vertically.

Question 11

Q

Types of TPUs

Answer

A

Cloud TPUs
Edge TPUs: small power efficient for AI in devices
TPU pods: massively interconnected

Question 12

Q

ASICs

Answer

A

Application Specific Integrated Circuits

Custom circuits for specific AI tasks. Inflexible and costly but efficient

Question 13

Q

FPGAs

Answer

A

FIeld Programmable Gate Arrays

Reconfigurable hardware more versatile than ASICs but lower performance

Question 14

Q

FPGA Architecture

Answer

A

Configurable Logic Blocks: consists of lookup table, flip flops and multiplexers
Routing Resources: Network for programmable switches/wires
Input/Output Blocks: Communication
Block RAM: Dedicated memory blocks

Question 15

Q

In memory Computing

Answer

A

Integrates computational capabilities directly within memory units
Seeks to minimise or eliminate the need for data movement

Question 16

Q

Neuromorphic computing

Answer

Study These Flashcards

A

Aims to emulate the structure and function of the brain.

Spike timing also related

Question 17

Q

Photonic Tensor Cores

Answer

Study These Flashcards

A

Specialised hardware units using light instead of electrons to perform calculations.

Input modulators
Photonic Waveguides: confined paths
Photonic Computing Elements
Photonic Memory
Output photodetectors (back into elerctrical signals)

8 - GPU Acceleration and AI HW Flashcards

(17 cards)