w8l2 Flashcards

Question 1

Q

what is the core idea of beam search

Answer

A

keep track of the k most probabl partial translations
k is beam size (around 5, 10)

beam search is not gaurneteed to find optimal solution but its efficent

we search for high scoring hypotheses

Question 2

Q

what is beam search stoping criteria

Answer

A

wait until eos (end of sentnece is hit)

or until we reach preestablished time t

Question 3

Q

what is attention input

Answer

A

hi…. hn

all encorder hidden states

decoder hidden state at time step

Question 4

Q

what is attention scores

Answer

A

score(s t,h k) k = 1…N

how relevant is source token k for target step t

Question 5

Q

what are attention weights

Answer

A

we softmax a function so that you get a probabiltiy distrbution over these numbers

so it will add up to one

Question 6

Q

attention output

Answer

A

weighted sum.

You take the probability output for a particular input h k multiplied with that h you.
And you do that for every single hidden layer and you add it up sorry hidden state.

And you add that up. And that is the output, uh, that represents that attention

Question 7

Q

what are some ways to compute attention score

Answer

A

dot product attention

multiplictaive attention

additive attention

Question 8

Q

what inspired the transformer

Answer

A

we need an atchtecture that provides contextual emeddings

camptues semantic and sunatic information like rnns

can process a sentence in paraelle
and is cheap per layer

Question 9

Q

seq2seq without attention uses what to proccess within encoder and decoder and decorder encoder interaction

Answer

A

rnn rnn static fixed sizdd vector repsectily

Question 10

Q

seq2seq with attention uses what to proccess within encoder and decoder and decorder encoder interaction

Answer

A

rnn rnn attention respectivly

Question 11

Q

what does self attention consist of

Answer

A

Query (q): vector from which the attention
is looking
● Key (k): vector at which the query looks to
establish context
● Value (v): value of word being looked at,
weighted based on context