Large Language Models Flashcards
How many params does chinchilla have
70b
Training Compute-Optimal Large Language Models” by Hoffmann et al Mar 2022: key conclusion
scaling the number of training tokens (that is, the amount of text data the model is fed) is as important as scaling model size.
Given a fixed compute budget, researchers should allocate it in similar proportions to increase model size and number of training tokens to reach the compute-optimal model (measured by minimal training loss). “For every doubling of model size the number of training tokens should also be doubled.
Chinchilla scores on MMLU
67.6%
PaLM was released when in relation to chinchilla?
A week later (I’m guessing like early April late march 2022)
PaLM # parameters
540B
Chinchilla is made by who?
Deepmind
PaLM is made by who?
2022 estimated PaLM training cost
$23.1m
What is diffusion LM?
A non auto regressive language model based on continuous diffusions
MMLU stands for
Multi task language understanding