LLMs Flashcards by Arron Lord

What is a corpus?

Corpus (or corpora for plural) is the text/body of texts the AI was trained on.

How well did you know this?

Not at all

Perfectly

What is context?

The section of the prompt the AI uses in their prediction of the next word.

How well did you know this?

Not at all

Perfectly

Explain the Markhov assumption

The future evolution of an object is independent of its history and solely based on the last step.

How well did you know this?

Not at all

Perfectly

Describe the process of a unigram n-gram model?

Counts the number of words in a corpus, assigns weightings to each word based on it’s count, suggests words based on these weightings.

How well did you know this?

Not at all

Perfectly

Describe the process of a bigram n-gram model?

Given the last word of the context, what is likely to be the next word. It counts each word-pair in the corpus and assigns weightings based on the count. It then checks the last word of the context, then suggests a word based on the probabilities of the word pairs with which the first word of the pair is the last word of the prompt.

How well did you know this?

Not at all

Perfectly

Describe the process of a trigram n-gram model?

Given the last two words of the context, what is likely to be the next word. It counts the occurrence of each three-word sequence in the corpus and assigns weightings based on the count. It then checks the last two words of the context, then suggests a word based on the weightings of the three-word sequences with which the first two words of the pair is the last two words of the prompt.

How well did you know this?

Not at all

Perfectly

Why do n-gram models using a higher n number fall short?

A sequence of words in the context needs to appear exactly the same in the corpus in order for the model to recognise it, check a weighting, and suggest the next word. If the exact context is not in the corpus, the model cannot suggest the next word.

How well did you know this?

Not at all

Perfectly

What are the drawbacks of n-gram models?

It cannot link information from different sections of a text.

How well did you know this?

Not at all

Perfectly

What is a deterministic model?

A model with a temperature of zero which gives a predictable result.

How well did you know this?

Not at all

Perfectly

What is an LLM?

Large Language Models are computational neural networks notable for their ability to achieve general-purpose language generation and other natural language processing tasks such as classification.

How well did you know this?

Not at all

Perfectly

What is a limitation of LLM’s and how is this circumvented?

LLM’s can only suggest words that are included in the corpus. To circumvent this, we use large training data sets such as social media or the Internet.

How well did you know this?

Not at all

Perfectly

What is the temperature of an AI model and what does a low and high temperature give? What are the use cases?

Temperature - how random an LLM is.

A temperature of zero gives zero randomness and a temperature of 2 give a high degree of randomness. At a low temperature, only the most likely outcome is selected, whereas at a high temperature, all words are equally likely to be selected.

Different temperatures have different uses. A low temperature is good for predictable text, such as for a cover letter, whereas a higher temperature can give poetry.

How well did you know this?

Not at all

Perfectly

What is an epoch?

An epoch is a training run, or a pass through the corpus.

How well did you know this?

Not at all

Perfectly

What is the training progression of an LLM?

Before training starts and even after the first few epochs, the predictions stored and given by the model are random. Any coherence is random.

Going through training runs increases coherence and relevancy, but a lot of training runs (1000s to 10000s are needed).

How well did you know this?

Not at all

Perfectly

What is the best method of correction and why?

A human can correct the LLMs, but due to the high level of epochs, it’s much more efficient for the model to correct itself based on the original text due to the large amount of training data.

How well did you know this?

Not at all

Perfectly

What is the measure of loss and what is the target figure? What is the word describing an LLM that is overtrained?

During each epoch, the neural network compares its prediction with the original data. The model’s prediction is likely off by some amount. The difference between the predicted and actual values is called theloss.

Each epoch reduces the loss, though how much we can reduce the loss and make the AI more accurate decreases over time.

Although reducing the loss does make an LLM more coherent and give better advice, a loss of zero would mean the outputted text is exactly the same as the original text, defeating the purpose of the LLM. It cannot output novel text. This is called overfitting.

A model isoverfitwhen it replicates patterns in the training data so well that it cannot account for new data or generate new patterns.

To prevent overfitting, we need to monitor training closely once the model begins performing well.

How well did you know this?

Not at all

Perfectly

What is preprocessing and why is it needed?

Need to turn raw data (full of mistakes and inconsistencies) into a clean data set via two processes - tokenisation and preprocessing.

Preprocessing makes all letters lowercase and removes punctuation. As a computer sees M and m as different, making all characters lowercase increases the frequency of words that would have lowercase and uppercase variations.

How well did you know this?

Not at all

Perfectly

What considerations should we make with regards to punctuation and capitalisation during preprocessing?

Study These Flashcards

Sometimes, such as spam filtration, it’s important to take capitalisation into account and not weed this out in preprocessing.

One also needs to account for punctuation, but a punctuation mark can mean multiple tings. It’s not reliable to assume a capital letter and full stop means a new sentence. Isntead, we can add a new special character

What is tokenisation?

Study These Flashcards

Tokenization is breaking a corpus into units that the model can train on. Words and punctuation are split into separate tokens.

What is white-space tokenisation?

Study These Flashcards

Whitespace tokenisation is splitting words by spaces.

What are the drawbacks of using syllables as tokens and what is done to overcome this?

Study These Flashcards

We can also split plural and tense words to isolate the stem (e.g. in started, start!, and starting, start is the stem). This requires knowledge of the grammar of the language.

Instead, one can build an LLm that uses characters as tokens.

What is Byte-Pair Encoding and what is it’s process?

Study These Flashcards

Byte-Pair Encodingis a tokenization algorithm that builds tokens from characters.

A further optimization is to search for the most common word pair, then make that word pair into a single token and replace all instances of the word pair in the corpus. If a single character no longer appears as it is counted for in a word pair token, we can remove it from the vocabulary of the LLM.

What is a neural network?

Study These Flashcards

A neural network is a linked collection of nodes split into layers. It has an input layer, and output layer, and any number of middle hidden layers. Each layer consists of nodes, and the nodes are connected between layers. Each node is assigned a random weight (or zero).

During development, the input nodes pass information to the nodes of the hidden layer. The strength of the signal is adjusted according to the weights and passed onto all nodes of the next layer - this is a probability between 0 and 1.

Each node is looking for a specific feature. Generally speaking, the further along the NN, the more complex the feature the node is looking for is.

Eventually, an output is given. In supervised learning, the human analyses the accuracy of the output. If accurate, positive feedback is given and the weights of the contributing nodes are increased so the result is more likely to be given again. If it is wrong, the weights are decreased so the result is less likely to be given.

The architecture is the neurons and hidden layers, whereas the weights are the calculation.

If the variables are plotted onto a graph, there is a linear regression showing the trend - this is the decision boundary.

How is information inputted into a neural network?

Study These Flashcards

To be inputted into a neural network, data needs to be in numerical form.

What are the positives of neural networks?

NNs are good at finding trends in lots of variables.

What is deep learning and what are the implications for NNs? What does deep learning allow?

Deeper networks have more hidden layers and contribute to deep learning. This needs more compute and the reasoning of the AI model is more complex. The deeper a network gives, the hard it becomes to tell what the AI is doing. If the AI model is giving an important decision (i.e. loan approval), we need to know the reason behind the decision. Deeper neural networks allow us to combine more complex calculations to define more complex criteria and features by added more (deeper) hidden levels.

What considerations should a Developer building a neural network bear in mind?

Calculations require compute. The more calculations needed, the more computer power needed. This determines neura network design

What are the steps to building a neural network?

1. Design the neural network and data collection The NN should be designs to avoid known biases and risks. There are libraries containing premade NNs, but it is important to use a mixed of premade NN and your own. One should also create the training data sets. It may be possible to use existing data sets, but one may need to build one. 2.Training epochs and NN adjustment

What data is needed to train a neural network?5

A training data set and a separate testing data set. The NN is run on the same training data in each epoch, then validated at the end with the testing data.

How can a NN be refined?

If the outputs of a NN are not clear after several hundred testing rounds, new variables or new data is needed.

What is backpropagation ?

The neurons contributing to the error are reduced and those giving good answer are increased.

What is the loss function? What is the error of loss function?

A measure of the accuracy of a AI models output compared to the expected. If high, the model is way off. The error of loss function is the value showing the error level on each epoch

What is the global optimal solution? How can one get to it?

A globally optimal solution is one where there are no other feasible solutions with better objective function values. One can get to the GOS by having multiple agents, starting at multiple points, having a large and robust data set.

What is a locally optimal solution?

A locally optimal solution is one where there are no other feasible solutions "in the vicinity" with better objective function values.

What is the Learning Rate?

The degree of backpropagation with every epoch.

What is overfitting?

Overfitting is an undesirable machine learning behavior that occurs when the machine learning model gives accurate predictions for training data but not for new data. This can also lead to coincidental connections.

What is a Confusion Matrix?

A Confusion Matrix shows correct and incorrect results.

What is the Precision of an AI model?

How much you should trust your programme when it gives a result. right answers / total = precision. % of answers that are correct

What is the recall of an AI model?

Recall - how much an AI can give the thing you are looking for. % if times it will give a correct result

What is one reason AI development is accelerating so quickly?

Efficiency of algorithms is going up, so less computing power. Amount of available computing power (compute) is also increasing, so effectiveness of AI si going up 4 fold

What is Deep Evidential Learning (DEL)?

Deep learning that provides predictions and the uncertainty of those predictions. It can give the confidence of a model’s output. This is important in medical settings and in self-driving cars.

What are the uses of DEL?

Applications include medical diagnosis, autonomous vehicles, and financial forecasting.

What are the two types of uncertainty?

Uncertainty can be divided into two types 1. Aleatoric Uncertainty: This is the inherent noise in the data. For example, variability in measurements or natural randomness 2. Epistemic Uncertainty: This stems from the model's lack of knowledge. For instance, uncertainty due to insufficient training data or model parameters.

LLMs Flashcards

(43 cards)