P3 - Processing Power & Scalability Flashcards

Question 1

Q

Why Do Chatbots Need High Processing Power?

Answer

A

Training large language models involves billions of parameters and vast datasets.
Deployment requires real-time processing of user queries.

Question 2

Q

What type of processor is best suited for small-scale AI models?

Answer

A

CPUs (Central Processing Units) are general-purpose processors that are cost-effective for small-scale models but become inefficient for larger ones.

Question 3

Q

Why are GPUs (Graphics Processing Units) ideal for training large AI models?

Answer

A

GPUs are designed for parallel processing, which makes them ideal for training large models and handling multiple computations simultaneously.

Question 4

Q

What are TPUs (Tensor Processing Units) and why are they significant in AI?

Answer

A

TPUs are specialized hardware developed by Google specifically for machine learning tasks, offering high efficiency for both training and inference at scale.

Question 5

Q

What scalability challenges do AI workloads face as chatbot usage grows?

Answer

A

Increasing usage requires more computational power to handle larger datasets and user bases, and high hardware costs and energy consumption can become limiting factors.

Question 6

Q

What are the cost vs. performance trade-offs for small-scale AI implementations?

Answer

A

For small-scale implementations, cost-effective options like CPUs or cloud-based GPU instances are preferable.

Question 7

Q

How do large-scale AI implementations differ in hardware investment?

Answer

A

Large-scale implementations often require investments in TPUs or dedicated data centers to ensure scalability and efficient processing of extensive workloads.

Question 8

Q

What are some key CPU options for AI workloads, and what features enhance their performance?

Answer

A

CPUs like Intel Xeon and AMD Threadripper are common. Intel Xeon processors include an AI engine in each core, benefit from faster memory, more cores, and larger last-level cache, which can reduce LLM latency up to 5x compared to default PyTorch over 10 weeks.

Question 9

Q

What is the von Neumann bottleneck in CPUs?

Answer

A

It refers to the limitation where memory access speed is much slower than the CPU’s computation speed, which can restrict overall throughput for data-intensive machine learning tasks.

Question 10

Q

Why are GPUs, such as the NVIDIA 4090 and A100, preferred for large-scale AI training?

Answer

A

GPUs are designed for massively parallel processing, making them excellent for matrix operations and training models (like Transformers) that require simultaneous computations. They provide significant speed-ups over CPUs for these tasks.

Question 11

Q

What are some challenges associated with using GPUs?

Answer

A

GPUs come with high upfront costs, consume a lot of energy, and require complex cooling (often needing water cooling). They’re also specialized hardware, so compatibility with smaller systems can be limited.

Question 12

Q

What makes TPUs uniquely suited for AI workloads, especially for Transformers?

Answer

A

TPUs (Tensor Processing Units) are specialized for matrix processing. They use a systolic array architecture—like Google’s Cloud TPU v3 with a 128×128 grid of ALUs—to perform fast, highly parallel matrix multiplications, which are core to Transformer attention and backpropagation.

Question 13

Q

What is a primary disadvantage of TPUs compared to CPUs?

Answer

A

TPUs are not Turing complete; they’re designed exclusively for neural network workloads and cannot perform general-purpose computing tasks like word processing.

Question 14

Q

In what scenarios are CPUs considered more cost-effective despite their slower parallel processing?

Answer

A

CPUs are ideal for smaller-scale models or algorithms (e.g., certain time series or RNN/LSTM models) that do not require extensive parallel processing, making them a cost-effective alternative when GPU use isn’t justified.

Question 15

Q

How do the hardware options balance cost vs. performance for different AI workloads?

Answer

A

For small-scale implementations, cost-effective options like CPUs or cloud-based GPU instances are sufficient. For large-scale deployments—like high-quality chatbots with billions of parameters—investments in TPUs or dedicated data centers are needed to ensure scalability and efficient performance.