11 - LLMs Flashcards
WHat is attention?
Transformers learn which words are important (associated) to each other word
Attention Is All You Need
The 2017 Google Brain Paper
How do neural nets process words? Word Embedding
Transform words into vectors for feeding into neural networks
Common Word Embeddings
Word2Vec, Glove and FastText
Use between 50 and 300 values to represent each word.
Similar words, similar vectors
Why is word embedding useful?
Two similar words with similar meaning have a similar vector.
A network performing sentiment analysis will find it easier to learn that sentences such as “I’m happy” or “I’m cheerful” have similar sentiments
Multi-head attention
Input is processed using three different learned matrices, W-Query, W-Keys and W-Values
Only contains linear operations;
Feedforward with non-linear activation is required.
Multiple versions allow for different attention mapping
Encoder part
Understands and extracts relevant info from the input. Outputs a continuous representation (embedding)
LLM Evaluation metrics
- ROUGE
Used for text summarisation and compares a summary to one or more reference summaries. - BLEU score
Used for text translation and compares to human generated translations
LoRA
Low Rank Adaptation
LoRA: Main Idea
(Hint: freeze, parameters, weights)
- freeze the weights of the self-attention module
- add task-specific knowledge using a small set of tunable parameters
LoRA Steps
- Freeze most of the original LLM weights
- insert 2 rank decomposition matrices
- Train the weights of the smaller matrices
Steps to update model for inference:
1. Matrix multiply the low rank matrices
2. Add to the original weights
Soft Prompts
Like word encoding but instead of words, are vectors that are tuned to improve model performance.
Same length as token vectors and typically 20-100 tokens
Soft Prompts with LoRA
We can have different sets of prompts for different tasks.
(Switch out soft prompt at inference time to change task)
Vanilla FIne TUning
Select a subset of parameters to tune and leave the rest frozen
Reparametrisation
Add knowledge with additional parameters as in LoRA