Week 5 - (part 1) LLMs And Gen AI Flashcards
To train the model, what sort of data is being used?
Common crawl, books, wikipedia etc.
What is Generative Pretraining?
Based on the algorithm, the model is trying to predict the next output based on probability of co-occurrence.
Generative Language Models do sampling + classification over tokens. Do you have to classify all the words in the dictionary?
No. Only the sub-words. This gives us a smaller class size to predict. This estimate is done for efficiency reasons. If you have all the words, too many classes and a lot of words not used.
Why don’t we use character classification?
If you use characters, the class size is too small and it takes very long.
How do they decide to group the sub-words?
Through frequency and statistics
Instruction fine tuning and human feedback
Instruction fine tuning - Step 1: a labeller demonstrates the desired output behaviour by telling the machine what is the correct answer given a set of inputs
Human feedback - Step 2: A labeller ranks the outputs from best to worst
Pros and cons of feedback
[+] train and adapt to come up with better answers
[+] overly compliant
[-] may not be safe
Why does the model hallucinate?
The model is always told there is one right answer. It will predict to its best ability based on probability. Probability is baked into its weight and parameter.
How to mitigate true preferential learning
Ask the model to output sub word tokens equivalent to “i dont know” and ask the model some questions with no answer. You are manipulating the probabilities.
Why can’t we just use LLMs for everything and rely on probability?
Efficiency Low
Lack of updatability
Issue of provenance
Effective
Good at synthesising information
What are the benefits of retrieval based NLP
Efficient, updatable, provenance, effective, synthesis
Do auto regressive LMS simply predict the next token?
Yes, that’s all they do (TO a certain extent). They predict scores over the entire vocabulary at each step. We then use those scores to compel them to predict some other token or other. They also present data in their internal and output representations.
How is generative pre-training different from predictive text?
Generative pre training involves looking through more things to produce more useful outputs. The goal is to learn USEFUL knowledge.
Emergent abilities - what AI chatbots can do that autocorrect cannot?
Play chess - an internal model of chess rules and strategy must be programmed in the first place