[NLP] Lecture 2: Large Language Models (Anna Rogers) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

WHat kind og LM do we have?

A

Auto regressive (auto finish?), masked model,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DIfference between corpus model and language model?

A

It would hep to call it corpus model, so it is more obvious the model is based on an exact corpus, so we don’t think it is not biased, compared to calling it language model, we “forget” it is not just langauge, it is trained on a specfific corpus.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain difference between pre-training and fine tuning

A

Pre-training: Pre-taining is not labelled, it is trained with regressive and masking (BERT), this is the base model
Fine-tuning: To make to do something other than predicting tokens, we fine tune it for at task

Biggest difference is the size of the available data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens during fine tuning?

A

Final layers gets changed the most

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can we fine-tune on many tasks at once?

A

Yes, “pre-fine-tuning”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is instruction tuning?

A

It is trained on 20 different text tasks before fine-tuning (the T5 model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

WHat is few shot learning?

A

Give examples in the prompt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is instruction tuning and RLHF?

A

Fine tuning: you take the same data as ect sentiment data, but you form it as an instruction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain basics about ChatGPT

A
  • Dialogue version og InstructGPT
  • New OpenAI in-house data (humans both writing and rating model response)
  • New ranking data for RLHF
  • Keeps changing under the haude
  • We dont know anything else about the models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is RAG

A

Retrieval augmented generation, a way to find out where the model it got the information. Bing does it, it provides sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why do the LM get better the bigger they are?

A

As long as you add more weights and data set, they will get better “neural scaling laws”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

WHat is data contamination?

A

When the model has seen something in the training data, and we test it on something similar?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is emergent properties

A

When a model can do something, it is not trained on. It is difficult to day, because we don’t know how much data it is allowed to see, to say it has been trained on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can we say that ChatGPT has emergent properties with the way it plays chess`

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Eliza effect?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Caveat 2: Finetuning vs few-show performance

A

since GPT-3 most big models were presented with few-shot evaluations only

12
Q

Caveat 3: The prompt matter

A
13
Q
A