Networks with memory and attention Flashcards
What is location-based addressing?
Synonym for direct addressing, i.e. specifying a particular address for a memory cell.
What is content-based addressing?
Synonym for indirect addressing. Memory cell contains information with context. Thus we perform some matching function g on the key to the memory cell and the query to create a score. Afterward, we can perform some scheme on how to select the memory cell(s) we want.
What is hard addressing?
Sampling the softmaxed scores of content-based addressing (i.e, select single memory cell).
What is soft addressing?
The weighted average/linear combination of the memory cell contents with respect to their softmaxed probabilities. This is also the expectation of the distribution.
What are the limitations to RNNS/LSTMs in regards to memory?
Memory is fixed to the computational capability of the RNN/LSTM. “External” memory is the idea of increasing the memory capacity of the RNN/LSTM.
Give an intuition for how an encoder-decoder framework works in the context of machine translation
First, we encode the meaning of the source language into an intermediate representation, then
we decode this into a sentence which represents this meaning in the target language.
How many RNNs would be necessary to translate to-and-from N languages?
If we train the different encoder and decoders jointly, so that they share the same intermediate representation, 2N RNNs would suffice. Otherwise N^2.
Why would external memory be desirable in a RNN?
The memory of an RNN is finite. If we would have external memory, separate from the computational model, more memory could be introduced if necessary.
Explain the intuition of content-based attention
Content-based attention works similarly to content-based addressing, the main difference being only read-operations is relevant and the “memory” actually being the output of earlier convolutional layers.
Why and in what applications could location-based attention be used?
E.g. in image classification. You use attention to select part of the image to classify, instead of the whole image.
Explain the intuition of location-based attention
Image attention as an analogy: We extract a glimpse (a crop of image). From this we extract a slightly larger part of the same image with the same center, rescale it to the same dimension as original crop and use this low-resolution “context” glimpse to decide where to select the next glimpse. This way, we can select interesting subpart of image with attention. The location policy (where we select where to “glimpse” next) is trained RL (policy gradient)
What is self-attention?
Never mind earlier explaination. Self-attention is a whole new thing where you actually disregard RNNs completely and do only attention, everywhere. They claim in their paper that self-attention success is due to a shorter “path” between words, thus improving long-term dependencies.
Natural language processing application:
Basically, you input both the source sentence and the translated sentence so far. Then both sentences are encoded (word embedding) and attention is applied to both. Then the encoded translated so-far sentence queries the source sentence (keys and values), and this produces a probability distribution for the next word.
Video for good explaination:
https://www.youtube.com/watch?v=iDulhoQ2pro
An extension to a location-based attention approach, where instead of having a query coming from a different source than the data, each memory cell is quiring every other memory cell. (Not sure of this explanation)