Handout #10 - Attention Mechanism and Transformer Model Flashcards

1
Q

What’s the problem with RNN?

A
  1. Sequential in nature -> can’t parallelise
  2. context computed from past only
  3. no explicit distinction between short and long range dependencies
  4. training is tricky
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Why can’t CNN and DNN be used in text processing?

A

The dependencies between different words aren’t between the current and previous context sample

-> the verb is not always the next word after the subject.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s cross-attention?

A

CA allows you to work with multiple modailities (e.g. audio, video, images, text)

-> works because it doesn’t depend to the position of the keys/values and can deal with any synchronisation issues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give the layman’s definition of the Transformer

A

It’s an encoder/decoder network that is solely based on sequences of Attention Layer blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly