w8l2 Flashcards

1
Q

what is the core idea of beam search

A

keep track of the k most probabl partial translations
k is beam size (around 5, 10)

beam search is not gaurneteed to find optimal solution but its efficent

we search for high scoring hypotheses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is beam search stoping criteria

A

wait until eos (end of sentnece is hit)

or until we reach preestablished time t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is attention input

A

hi…. hn

all encorder hidden states

decoder hidden state at time step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is attention scores

A

score(s t,h k) k = 1…N

how relevant is source token k for target step t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are attention weights

A

we softmax a function so that you get a probabiltiy distrbution over these numbers

so it will add up to one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

attention output

A

weighted sum.

You take the probability output for a particular input h k multiplied with that h you.
And you do that for every single hidden layer and you add it up sorry hidden state.

And you add that up. And that is the output, uh, that represents that attention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are some ways to compute attention score

A

dot product attention

multiplictaive attention

additive attention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what inspired the transformer

A

we need an atchtecture that provides contextual emeddings

camptues semantic and sunatic information like rnns

can process a sentence in paraelle
and is cheap per layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

seq2seq without attention uses what to proccess within encoder and decoder and decorder encoder interaction

A

rnn rnn static fixed sizdd vector repsectily

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

seq2seq with attention uses what to proccess within encoder and decoder and decorder encoder interaction

A

rnn rnn attention respectivly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does self attention consist of

A

Query (q): vector from which the attention
is looking
● Key (k): vector at which the query looks to
establish context
● Value (v): value of word being looked at,
weighted based on context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly