Transformers:Attention Flashcards

Question 1

Q

Attention formula

Question 2

Q

How do the keys, values, and query in attention map to a search process for say a youtube search query?

Answer

A

Query = text in search bar
Set of keys = video titles, words in video descriptions, maybe video tags too
Set of values = the actual videos (or I guess the video id?)

Question 3

Q

In a seq2seq model we encode the input to be a what?

Answer

A

Context vector

Question 4

Q

What kind of operation is self attention?

Answer

A

A sequence to sequence operation [ugghhhh where is the source on this? I thought I had another card that said seq to seq operations have n inputs and n outputs but sequencer to sequence models don’t necessarily have the same number of inputs and outputs, so I am very confused right now]

Question 5

Q

What is the high level formula for self attention?

Question 6

Q

What is a high level implementation

Question 7

Q

How would you naively calculate w21 in the self attention formula?

Answer

A

It would be dot product of x2 and x1

Question 8

Q

What is the only operation in the transformer architecture that propagates information between vectors?

Answer

A

The self attention operation

Question 9

Q

How to fully calculate the weight value in self attention, given the naive calculation of the weight value?

Transformers:Attention Flashcards

(9 cards)