Networks with memory and attention Flashcards

1
Q

What is location-based addressing?

A

Synonym for direct addressing, i.e. specifying a particular address for a memory cell.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is content-based addressing?

A

Synonym for indirect addressing. Memory cell contains information with context. Thus we perform some matching function g on the key to the memory cell and the query to create a score. Afterward, we can perform some scheme on how to select the memory cell(s) we want.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is hard addressing?

A

Sampling the softmaxed scores of content-based addressing (i.e, select single memory cell).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is soft addressing?

A

The weighted average/linear combination of the memory cell contents with respect to their softmaxed probabilities. This is also the expectation of the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the limitations to RNNS/LSTMs in regards to memory?

A

Memory is fixed to the computational capability of the RNN/LSTM. “External” memory is the idea of increasing the memory capacity of the RNN/LSTM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give an intuition for how an encoder-decoder framework works in the context of machine translation

A

First, we encode the meaning of the source language into an intermediate representation, then
we decode this into a sentence which represents this meaning in the target language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How many RNNs would be necessary to translate to-and-from N languages?

A

If we train the different encoder and decoders jointly, so that they share the same intermediate representation, 2N RNNs would suffice. Otherwise N^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why would external memory be desirable in a RNN?

A

The memory of an RNN is finite. If we would have external memory, separate from the computational model, more memory could be introduced if necessary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the intuition of content-based attention

A

Content-based attention works similarly to content-based addressing, the main difference being only read-operations is relevant and the “memory” actually being the output of earlier convolutional layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why and in what applications could location-based attention be used?

A

E.g. in image classification. You use attention to select part of the image to classify, instead of the whole image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain the intuition of location-based attention

A

Image attention as an analogy: We extract a glimpse (a crop of image). From this we extract a slightly larger part of the same image with the same center, rescale it to the same dimension as original crop and use this low-resolution “context” glimpse to decide where to select the next glimpse. This way, we can select interesting subpart of image with attention. The location policy (where we select where to “glimpse” next) is trained RL (policy gradient)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is self-attention?

A

Never mind earlier explaination. Self-attention is a whole new thing where you actually disregard RNNs completely and do only attention, everywhere. They claim in their paper that self-attention success is due to a shorter “path” between words, thus improving long-term dependencies.

Natural language processing application:
Basically, you input both the source sentence and the translated sentence so far. Then both sentences are encoded (word embedding) and attention is applied to both. Then the encoded translated so-far sentence queries the source sentence (keys and values), and this produces a probability distribution for the next word.

Video for good explaination:
https://www.youtube.com/watch?v=iDulhoQ2pro

An extension to a location-based attention approach, where instead of having a query coming from a different source than the data, each memory cell is quiring every other memory cell. (Not sure of this explanation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly