Processing Power Flashcards

Question 1

Q

3 aspects of processing power

Answer

A

Computational capacity
Memory resources
Efficiency & Speed

Question 2

Q

Computational Capacity

Answer

A

The ability of the hardware to carry out a large number of complex instructions quickly. This is measured in FLOPS (floating-point operations per seconds)

Question 3

Q

Memory resources

Answer

A

How available is sufficient RAM and VRAM for handling running large models and large volumes of data quickly. Furthermore, these resources ensure for smooth processing and quick access to necessary information.

Question 4

Q

Efficiency & Speed

Answer

A

The capability to manage high throughput and low latency while optimizing for low energy consumption.

Question 5

Q

Throughput

Answer

A

The number of data batches are model can process per unit of time.

Question 6

Q

Preprocessing

Answer

A

One of main tasks of a NN.

Preparing raw data for training to be used by LLM be cleaning and transforming it into a suitable format.

Question 7

Q

Training the model

Answer

A

One of main tasks of a NN.

Teaching the LLM to understand and generate human-like text by optimizing its parameters using a large dataset.

Question 8

Q

Deploying the model

Answer

A

One of main tasks of a NN.

Making the trained LLM available for use.

Question 9

Q

Pre-processing Process

Answer

A

Cleaning (Removing noise and irrelevant information from the dataset)
Selection (Choosing relevant data and features for model training.)
Transformation (Converting data into a suitable format for model training)
Reduction of data (Decreasing the volume of data while retaining important information.)

Question 10

Q

Bag-of-Words Algorithm

Answer

A

Type of pre-processing involving 3 steps:

Tokenization - Text is split into individual words (tokens), often removing punctuation and common “stop words” (such as “and”, “the”, etc.)

Vocabulary Creation - A collection all of the unique words in the text (known as corpus) is created, with each word assigned a unique index.

Vectorization - Each document is represented as a vector of word counts, where the vector length equals the size of the vocabulary, and each element corresponds to the count of a specific word in the document.

Question 11

Q

X & Y In Bag of words

Answer

A

input and output

Question 12

Q

4 advantages of Bag-of-Words

Answer

A

Straightforward. The BoW algorithm is simple to understand and easy to implement as it involves basic operations.
Minimal preprocessing. Requires minimal preprocessing of text data, making it accessible and quick to deploy.
Does not require knowledge of grammar or language structure to implement, which simplifies its application across different languages.
Computationally efficient. This is because it involves simple counting operations and vector representations.

Question 13

Q

Bag-of-Words 4 disadvantages

Answer

A

BoW ignores the order of words in the text, leading to a loss of syntactic and semantic information (how words are organized and what they mean).
No contextual understanding of text, which can be critical for understanding meaning in natural language.
Resource intensive (potentially). If large businesses for example utilize this, the vocabulary can become extremely large, leading to high dimensional vectors and ultimately, making the model computationally expensive and memory-intensive.
Sensitivity to Irrelevant Words. Filler/Irrelevant words at a high frequency can dominate the vectors unless explicitly removed.

Question 14

Q

Advantages of GPUs for being using for LLMs

Answer

A

Multiple cores. Can handle thousands of tasks simultaneously.
Fast Data transfer between the GPU and its memory which is essential for large datasets and computations in deep learning.
Large VRAM reducing latency and enhancing performance of the model
Programmability. Frameworks like NVIDIA’s CUDA and OpenCL enable custom coding to leverage GPUs’ parallel processing for applications beyond graphics.

Question 15

Q

Tensor Processing Units (TPU)

Answer

A

This is a custom-designed application-specific integrated circuits (ASICs) developed by Google specifically to accelerate machine learning workloads, particularly deep learning tasks.

Question 16

Q

2 characteristics of TPU

Answer

Study These Flashcards

A

Each TPU unit has 8 cores
Each core has between 8 and 32GB of RAM associated with it

Question 17

Q

3 Advantages of TPUs being utilized for machine learning

Answer

Study These Flashcards

A

They have a unique architecture to perform common deep learning operations such as performing large matrix multiplications efficiently.
They can handle massive amounts of parallel computations, which significantly speeds up the training and inference of large machine learning models
They use high-speed memory to store large amounts of data close to the processing units, reducing latency and increasing throughput.
High performance with lower power consumption
Thermal efficiency due to their specialized design
Distribution of tasks across many TPUS (there are designed to work in large-scale clusters called “pods”) supporting the training of extremely large models on massive datasets.

Question 18

Q

Inference

Answer

Study These Flashcards

A

This refers to the process of using a trained model to make predictions on new, unseen data.

Question 19

Q

3 usages of TPUs

Answer

Study These Flashcards

A

Large scale model training, usually NLP or computer vision, since they are much faster than GPUs or CPUs.
Real-time inference since they provide very low-latency, making them suitable for applications that require real-time decision-making
They are used in R&D by researchers who take advantage of their high computational power to experiment with model architectures and training techniques.

Question 20

Q

Clustering with LLMs, 4 advantages

Answer

Study These Flashcards

A

Increased computational power
Scalable
Reduced training time due to distribution of tasks
High throughput

Question 21

Q

Clustering with LLMs 4 Disadvantages

Answer

Study These Flashcards

A

Complex setup and management
High cost
Communication overhead introduced from distribution of tasks which can limit efficiency gains
Energy consumption is high

Question 22

Q

How does complexity of the model and the dataset in training affect processing power

Answer

Study These Flashcards

A

The model is very large with many parameters, layers and have sophisticated components such as transformers this will increase the amount of processing power needed.
Large datasets and datasets that need extensive preprocessing require more processing power.

Question 23

Q

How does hardware utilization and model architecture in training affect processing power?

Answer

Study These Flashcards

A

The more GPUs/TPUs are available, the better the processing power and training speed.
Using hardware-specific optimizations and accelerators can reduce processing power requirements.
Different types of models, such as BERT, GPT have varying computational requirements.
Inclusion of specialized layers or operations can increase processing power needed.

Question 24

Q

How inference latency and throughput as part of the deployment process is affected by processing power?

Answer

Study These Flashcards

A

The lower the desired latency of inference, the more processing power required
The higher the number of inferences the model can handle per second, the more processing power required.

Processing Power Flashcards

(24 cards)