Hoorcollege 11 pretrained language models Flashcards

1
Q

ELMO architecture

A
  • 2 layer, bidirectional
  • word vector: weighted sum of hidden states
  • aH0 + bH1 + cH2 = F(H0, H1, H2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

GPT (generative pre-trained transformers) architecture

A
  • Transformer architecture, undirectional (“decoder”)
  • Each token gets a vector for token level tasks
  • Whole sequence also get’s a vector
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

BERT

A
  • Bidirectional
  • Has for each token a bidirectionally contextualized representation at each layer
  • For this it uses either the trick of replacing 15% of tokens by [MASK] or Next Sentence
    Prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

NSP (Next sentence predicting)

A
  • uses CLS tokens to separate sentences and sentence into vector as input to NSP
    classifier to predict if the second sentence would be a logical next sentence or not
  • C of [cls] -> classify sequences
  • sequences useful for: sentiment analysis, NLI, paraphrasing
  • token vectors useful for: POS tagging, NER, WSD
  • different layers are useful in different tasks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ways of using pretrained models

A
  • Freeze: use the embeddings ars they are
  • Fine-tuning: adjust the model’s parameters while training the classifier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Transformer

A

MultiHeadAttention -> addition -> LayerNorm -> FFN -> addition not of data ->
LayerNorm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Generation as sampling

A
  • greedy
  • Beam search
  • Random sampling with y =
    SoftMax(u)
    Random sampling: use top-k sampling, top-p sampling, temperature sampling

Scaling laws, but not only size matters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Winograd schema resolution

A

for “he” having a P to be either one or another person replace
“he” by name and calculate P

How well did you know this?
1
Not at all
2
3
4
5
Perfectly