8 - GPU Acceleration and AI HW Flashcards
WHy is AI computationally expensive?
- Massive models with billions of parameters
- Complex operations: Training involves matrix mult., grad calc., backprop
- Iterative Training: Repeatedly updating parameters using grad descent
- Vast Datasets
FLOPs
Floating Point Operations Per Second
TOPs
Tera Operations Per Second. FOR INTGER-POINT OPERATIONS (compared to floating point)
Why aren’t CPUs good for AI?
Their serial nature limits their ability to handle parallel AI workloads.
CUDA
Nvidia’s programming framework for GPU tasks beyond graphics.
Tensor Cores
Specialised hardware units to handle matrix multiplication.
Each one operates on a 4x4 matrix and performs D = A x B + C (each is a 4x4 matrix)
Perform all operations in one clock rather than sequentially
CPUs vs GPUs in core numbers and serial vs parallel
CPUs are masters of serial tasks with lower core numbers that are more powerful
GPUs on the other hand have hundreds of cores and excel at parallel tasks.
CUDA Cores
Traditional GPU cores
Google TPU make up
Matrix Multiply Units (like tensor cores)
Unified Buffer ( High speed memory for MXU)
Weight FIFO: Stores neural net weights for MXU
Scalar and Vector Units: Handle additional arithmetic and control flow
Systolic Arrays
Grid of MACs: The MXU is a large grid of Multiply-Accumulate Units that do A*B+C
Input data (A) flows horizontally through the array and weights (B) flow vertically. Partial sums accumulate vertically.
Types of TPUs
Cloud TPUs
Edge TPUs: small power efficient for AI in devices
TPU pods: massively interconnected
ASICs
Application Specific Integrated Circuits
Custom circuits for specific AI tasks. Inflexible and costly but efficient
FPGAs
FIeld Programmable Gate Arrays
Reconfigurable hardware more versatile than ASICs but lower performance
FPGA Architecture
- Configurable Logic Blocks: consists of lookup table, flip flops and multiplexers
- Routing Resources: Network for programmable switches/wires
- Input/Output Blocks: Communication
- Block RAM: Dedicated memory blocks
In memory Computing
Integrates computational capabilities directly within memory units
Seeks to minimise or eliminate the need for data movement