C5W4(Transformer Network) Flashcards

1
Q

Bottleneck of plain RNN(GRU, LSTM)

A

Sequential way of processing input data (one by one)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Self-Attention?

A

Compute Attention-based representation for each word.
A(q, K, V) - W1query, W2key, W3*value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is MultiHead attention?

A

Apply Self-Attention several time(#h times. H - hyperparameter), each time with unique weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly