LLMs Flashcards
What is a corpus?
Corpus (or corpora for plural) is the text/body of texts the AI was trained on.
What is context?
The section of the prompt the AI uses in their prediction of the next word.
Explain the Markhov assumption
The future evolution of an object is independent of its history and solely based on the last step.
Describe the process of a unigram n-gram model?
Counts the number of words in a corpus, assigns weightings to each word based on it’s count, suggests words based on these weightings.
Describe the process of a bigram n-gram model?
Given the last word of the context, what is likely to be the next word. It counts each word-pair in the corpus and assigns weightings based on the count. It then checks the last word of the context, then suggests a word based on the probabilities of the word pairs with which the first word of the pair is the last word of the prompt.
Describe the process of a trigram n-gram model?
Given the last two words of the context, what is likely to be the next word. It counts the occurrence of each three-word sequence in the corpus and assigns weightings based on the count. It then checks the last two words of the context, then suggests a word based on the weightings of the three-word sequences with which the first two words of the pair is the last two words of the prompt.
Why do n-gram models using a higher n number fall short?
A sequence of words in the context needs to appear exactly the same in the corpus in order for the model to recognise it, check a weighting, and suggest the next word. If the exact context is not in the corpus, the model cannot suggest the next word.
What are the drawbacks of n-gram models?
It cannot link information from different sections of a text.
What is a deterministic model?
A model with a temperature of zero which gives a predictable result.
What is an LLM?
Large Language Models are computational neural networks notable for their ability to achieve general-purpose language generation and other natural language processing tasks such as classification.
What is a limitation of LLM’s and how is this circumvented?
LLM’s can only suggest words that are included in the corpus. To circumvent this, we use large training data sets such as social media or the Internet.
What is the temperature of an AI model and what does a low and high temperature give? What are the use cases?
Temperature - how random an LLM is.
A temperature of zero gives zero randomness and a temperature of 2 give a high degree of randomness. At a low temperature, only the most likely outcome is selected, whereas at a high temperature, all words are equally likely to be selected.
Different temperatures have different uses. A low temperature is good for predictable text, such as for a cover letter, whereas a higher temperature can give poetry.
What is an epoch?
An epoch is a training run, or a pass through the corpus.
What is the training progression of an LLM?
Before training starts and even after the first few epochs, the predictions stored and given by the model are random. Any coherence is random.
Going through training runs increases coherence and relevancy, but a lot of training runs (1000s to 10000s are needed).
What is the best method of correction and why?
A human can correct the LLMs, but due to the high level of epochs, it’s much more efficient for the model to correct itself based on the original text due to the large amount of training data.
What is the measure of loss and what is the target figure? What is the word describing an LLM that is overtrained?
During each epoch, the neural network compares its prediction with the original data. The model’s prediction is likely off by some amount. The difference between the predicted and actual values is called theloss.
Each epoch reduces the loss, though how much we can reduce the loss and make the AI more accurate decreases over time.
Although reducing the loss does make an LLM more coherent and give better advice, a loss of zero would mean the outputted text is exactly the same as the original text, defeating the purpose of the LLM. It cannot output novel text. This is called overfitting.
A model isoverfitwhen it replicates patterns in the training data so well that it cannot account for new data or generate new patterns.
To prevent overfitting, we need to monitor training closely once the model begins performing well.
What is preprocessing and why is it needed?
Need to turn raw data (full of mistakes and inconsistencies) into a clean data set via two processes - tokenisation and preprocessing.
Preprocessing makes all letters lowercase and removes punctuation. As a computer sees M and m as different, making all characters lowercase increases the frequency of words that would have lowercase and uppercase variations.