C5W4(Transformer Network) Flashcards by Eduard Voronkin

Bottleneck of plain RNN(GRU, LSTM)

Sequential way of processing input data (one by one)

How well did you know this?

Not at all

Perfectly

What is Self-Attention?

Compute Attention-based representation for each word.
A(q, K, V) - W1query, W2key, W3*value

How well did you know this?

Not at all

Perfectly

What is MultiHead attention?

Apply Self-Attention several time(#h times. H - hyperparameter), each time with unique weights.

How well did you know this?

Not at all

Perfectly