[NLP] Lecture 2: Large Language Models (Anna Rogers) Flashcards

1
Q

What kind of LM do we have?

A

Autoregressive Language Models:

Predict the next token based on previous tokens

Masked Language Models:

  • Predict masked (hidden) tokens within a sequence
  • Can use both left and right context to make predictions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DIfference between corpus model and language model?

A

It would hep to call it corpus model, so it is more obvious the model is based on an exact corpus, so we don’t think it is not biased, compared to calling it language model, we “forget” it is not just langauge, it is trained on a specfific corpus.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain difference between pre-training and fine tuning

A

Pre-training: Pre-taining is not labelled, it is trained with regressive and masking (BERT), this is the base model

Fine-tuning: To make to do something other than predicting tokens, we fine tune it for at task

Biggest difference is the size of the available data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens during fine tuning?

A

Final layers gets changed the most

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is pre-fine-tuning

A

An intermediate stage between pre and fine-tuning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is instruction tuning?

A

It is trained on 20 different text tasks before fine-tuning (the T5 model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is few shot learning?

A

Give examples in the prompt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is instruction tuning and RLHF?

A

Instruction tuning focuses on teaching the model to follow instructions

RLHF uses human feedback to refine the model’s understanding of what constitutes a good response

Instruction tuning is about capability, RLHF is about aligning the model with human values and expectations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain basics about ChatGPT

A
  • Dialogue version og InstructGPT
  • New OpenAI in-house data (humans both writing and rating model response)
  • New ranking data for RLHF
  • Keeps changing under the haude
  • We dont know anything else about the models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is RAG

A

Retrieval augmented generation, a way to find out where the model it got the information. Bing does it, it provides sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why do the LM get better the bigger they are?

A

As long as you add more weights and data set, they will get better “neural scaling laws”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

WHat is data contamination?

A

When the model has seen something in the training data, and we test it on something similar?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is emergent properties

A

When a model can do something, it is not trained on. It is difficult to day, because we don’t know how much data it is allowed to see, to say it has been trained on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Eliza effect?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Caveat 2: Finetuning vs few-show performance

A

since GPT-3 most big models were presented with few-shot evaluations only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Caveat 3: The prompt matter

A
12
Q
A