LLM - Prompting, Alignment, Instruction Tuning Flashcards
What is in-context learning?
LM learns to do a downstream task by conditioning on input-output examples!
No weight update — our model is not explicitly pre-trained to learn from examples
Why is in-context learning useful?
Because additional finetuning of the models is expensive!!!
For fine-tuning of the models, the labeling of the data is costly: usualy expertise is needed (medical, legal..), finding more data is difficult. When something new happens, we need to act quickly and update our model. Training is sensitive to hyperparameter selection…
With in-context learning, it is much easier to change LM’s behavior.
What are some common pitfalls when doing in-context learning?
In-context learning is highly sensitive to prompt format (training sets and patterns/verbalizers). Inpiut/Sentiment, or Review/Starts,,,
Majority Label Bias: If we want model to predict positive/negative sentiment, and if we provide only positive ones, model with be bias towards the positive.
Recency bias: examples near end
of prompt dominate predictions. PPPN would lead to negative-sentiment bias
What are problems where just question/answer in in-context learning is not enough?
Problems that involve reasoning: Take the last letters of the words … and concatenate them. Models struggle with tasks like this. To solve this, we should add the reasoning ot our examples:
Q: “Elon Musk”
A: the last letter of “Elon” is “n”. the last letter of “Musk” is “k”. Concatenating “n”, “k” leads to “nk”. so the output is “nk”.
What is Few-Shot learning?
When we give examples to our LM in context how the output should look like for some questions. What do we expect from the LM to do.
What is CoT?
CoT or Chain-of-Thought is when we explicitly ask LM to give us the thought before reasoning. It can be combined with Few-Shot learning.
What is self-consistency in LMs?
- We prompt an LM using CoT and Few-Shot learning
- We query it multiple times to generate diverse set of reasoning paths
- Choose the most consistent asnwer using the majority vote
What is the alignment problem with LMs?
There is a mismatch between LLM pre-training and user intents or human values.
Daniel: Hey AI, get me coffee before my class at 8:55am.
■ Robot: “Coffee Shop” opens at 8:30am and it usually has a line of people. It is
unlikely that I give you your coffee on time.
■ Daniel: Well, try your best …
■ Robotic: [tases everyone in line waiting to order]
What is instruction tuning?
Finetuning language models on a collection of datasets that involve mapping language instructions to their corresponding desirable generations.
- We need examples of instructions/output pairs across many tasks, and then we fine tune the LM
- Evaluate the LM on unseen tasks
Problem with this is finding data that represents the variety of tasks. But what is a task?
NLP tasks: classification, translation, summarization… but what humans need is more general and they happen in some random contexts.
How can we collect diverse collection of data for instruction tuning?
By having one task like summarization but include it in variety of contexts: