Processing Power Flashcards

1
Q

3 aspects of processing power

A
  • Computational capacity
  • Memory resources
  • Efficiency & Speed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Computational Capacity

A

The ability of the hardware to carry out a large number of complex instructions quickly. This is measured in FLOPS (floating-point operations per seconds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Memory resources

A

How available is sufficient RAM and VRAM for handling running large models and large volumes of data quickly. Furthermore, these resources ensure for smooth processing and quick access to necessary information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Efficiency & Speed

A

The capability to manage high throughput and low latency while optimizing for low energy consumption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Throughput

A

The number of data batches are model can process per unit of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Preprocessing

A

One of main tasks of a NN.

Preparing raw data for training to be used by LLM be cleaning and transforming it into a suitable format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Training the model

A

One of main tasks of a NN.

Teaching the LLM to understand and generate human-like text by optimizing its parameters using a large dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Deploying the model

A

One of main tasks of a NN.

Making the trained LLM available for use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pre-processing Process

A
  1. Cleaning (Removing noise and irrelevant information from the dataset)
  2. Selection (Choosing relevant data and features for model training.)
  3. Transformation (Converting data into a suitable format for model training)
  4. Reduction of data (Decreasing the volume of data while retaining important information.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bag-of-Words Algorithm

A

Type of pre-processing involving 3 steps:

Tokenization - Text is split into individual words (tokens), often removing punctuation and common “stop words” (such as “and”, “the”, etc.)

Vocabulary Creation - A collection all of the unique words in the text (known as corpus) is created, with each word assigned a unique index.

Vectorization - Each document is represented as a vector of word counts, where the vector length equals the size of the vocabulary, and each element corresponds to the count of a specific word in the document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

X & Y In Bag of words

A

input and output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

4 advantages of Bag-of-Words

A
  1. Straightforward. The BoW algorithm is simple to understand and easy to implement as it involves basic operations.
  2. Minimal preprocessing. Requires minimal preprocessing of text data, making it accessible and quick to deploy.
  3. Does not require knowledge of grammar or language structure to implement, which simplifies its application across different languages.
  4. Computationally efficient. This is because it involves simple counting operations and vector representations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Bag-of-Words 4 disadvantages

A
  1. BoW ignores the order of words in the text, leading to a loss of syntactic and semantic information (how words are organized and what they mean).
  2. No contextual understanding of text, which can be critical for understanding meaning in natural language.
  3. Resource intensive (potentially). If large businesses for example utilize this, the vocabulary can become extremely large, leading to high dimensional vectors and ultimately, making the model computationally expensive and memory-intensive.
  4. Sensitivity to Irrelevant Words. Filler/Irrelevant words at a high frequency can dominate the vectors unless explicitly removed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Advantages of GPUs for being using for LLMs

A
  • Multiple cores. Can handle thousands of tasks simultaneously.
  • Fast Data transfer between the GPU and its memory which is essential for large datasets and computations in deep learning.
  • Large VRAM reducing latency and enhancing performance of the model
  • Programmability. Frameworks like NVIDIA’s CUDA and OpenCL enable custom coding to leverage GPUs’ parallel processing for applications beyond graphics.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Tensor Processing Units (TPU)

A

This is a custom-designed application-specific integrated circuits (ASICs) developed by Google specifically to accelerate machine learning workloads, particularly deep learning tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

2 characteristics of TPU

A

Each TPU unit has 8 cores
Each core has between 8 and 32GB of RAM associated with it

17
Q

3 Advantages of TPUs being utilized for machine learning

A
  • They have a unique architecture to perform common deep learning operations such as performing large matrix multiplications efficiently.
  • They can handle massive amounts of parallel computations, which significantly speeds up the training and inference of large machine learning models
  • They use high-speed memory to store large amounts of data close to the processing units, reducing latency and increasing throughput.
  • High performance with lower power consumption
  • Thermal efficiency due to their specialized design
  • Distribution of tasks across many TPUS (there are designed to work in large-scale clusters called “pods”) supporting the training of extremely large models on massive datasets.
18
Q

Inference

A

This refers to the process of using a trained model to make predictions on new, unseen data.

19
Q

3 usages of TPUs

A
  • Large scale model training, usually NLP or computer vision, since they are much faster than GPUs or CPUs.
  • Real-time inference since they provide very low-latency, making them suitable for applications that require real-time decision-making
  • They are used in R&D by researchers who take advantage of their high computational power to experiment with model architectures and training techniques.
20
Q

Clustering with LLMs, 4 advantages

A
  1. Increased computational power
  2. Scalable
  3. Reduced training time due to distribution of tasks
  4. High throughput
21
Q

Clustering with LLMs 4 Disadvantages

A
  1. Complex setup and management
  2. High cost
  3. Communication overhead introduced from distribution of tasks which can limit efficiency gains
  4. Energy consumption is high
22
Q

How does complexity of the model and the dataset in training affect processing power

A
  1. The model is very large with many parameters, layers and have sophisticated components such as transformers this will increase the amount of processing power needed.
  2. Large datasets and datasets that need extensive preprocessing require more processing power.
23
Q

How does hardware utilization and model architecture in training affect processing power?

A
  1. The more GPUs/TPUs are available, the better the processing power and training speed.
  2. Using hardware-specific optimizations and accelerators can reduce processing power requirements.
  3. Different types of models, such as BERT, GPT have varying computational requirements.
  4. Inclusion of specialized layers or operations can increase processing power needed.
24
Q

How inference latency and throughput as part of the deployment process is affected by processing power?

A
  1. The lower the desired latency of inference, the more processing power required
  2. The higher the number of inferences the model can handle per second, the more processing power required.