Large Language Models Flashcards

1
Q

How many params does chinchilla have

A

70b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Training Compute-Optimal Large Language Models” by Hoffmann et al Mar 2022: key conclusion

A

scaling the number of training tokens (that is, the amount of text data the model is fed) is as important as scaling model size.
Given a fixed compute budget, researchers should allocate it in similar proportions to increase model size and number of training tokens to reach the compute-optimal model (measured by minimal training loss). “For every doubling of model size the number of training tokens should also be doubled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Chinchilla scores on MMLU

A

67.6%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

PaLM was released when in relation to chinchilla?

A

A week later (I’m guessing like early April late march 2022)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PaLM # parameters

A

540B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Chinchilla is made by who?

A

Deepmind

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

PaLM is made by who?

A

Google

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2022 estimated PaLM training cost

A

$23.1m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is diffusion LM?

A

A non auto regressive language model based on continuous diffusions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

MMLU stands for

A

Multi task language understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly