Transformers:Attention Flashcards

1
Q

Attention formula

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do the keys, values, and query in attention map to a search process for say a youtube search query?

A

Query = text in search bar
Set of keys = video titles, words in video descriptions, maybe video tags too
Set of values = the actual videos (or I guess the video id?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In a seq2seq model we encode the input to be a what?

A

Context vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What kind of operation is self attention?

A

A sequence to sequence operation [ugghhhh where is the source on this? I thought I had another card that said seq to seq operations have n inputs and n outputs but sequencer to sequence models don’t necessarily have the same number of inputs and outputs, so I am very confused right now]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the high level formula for self attention?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a high level implementation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How would you naively calculate w21 in the self attention formula?

A

It would be dot product of x2 and x1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the only operation in the transformer architecture that propagates information between vectors?

A

The self attention operation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to fully calculate the weight value in self attention, given the naive calculation of the weight value?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly