C5W4(Transformer Network) Flashcards
1
Q
Bottleneck of plain RNN(GRU, LSTM)
A
Sequential way of processing input data (one by one)
2
Q
What is Self-Attention?
A
Compute Attention-based representation for each word.
A(q, K, V) - W1query, W2key, W3*value
3
Q
What is MultiHead attention?
A
Apply Self-Attention several time(#h times. H - hyperparameter), each time with unique weights.