RNNs Flashcards
How do Hidden Markov Models input words?
They do it one at a time
What are some limits around state based models?
Supervised ML techniques take fixed sequence inputs, but sentences can vary in length
What is the workaround to sentences not having a fixed length?
We use a sliding window of words
What is a problem using a sliding window of words?
It is hard to learn semantic patterns due to long range dependencies
Why can a single sentence generate lots of inputs?
This is because of the sliding window - if we have the sentence “and thanks for all the fish” and a window size of 3, we can have inputs of “and thanks for”, “thanks for all”, “for all the” and “all the fish”.
In the image, what is the size of the input?
It is 3 times dimension of the embedding, as we have three embeddings that are all concatenated together
What are Recurrent Neural Networks based on?
Elman Networks
What is different about RNNs compared to NNs?
We don’t just take the immediate input, we also factor in the previous input as well.
What happens to the values of the hidden layer at time t-1 when an input is received at time t?
The values are provided as input in addition to the current input vector
What type of network does the image show?
A simple RNN
Explain what the image shows?
It shows how an RNN works, we have an input vector, which is adjusted by the weights, w, and the hidden layer from the previous input. We aggregate these with the current weights w to get the new value for the hidden layer values. These are multiplied by the weights for the output which are then output.
How are the hidden layer values computed?
An activation function is used
Explain what the image shows
It shows that to get the hidden layer values, ht, we multiply the previous hidden layer by weights U, and add the current input multiplied by weights W. Then to get the output, we use a function f, usually softmax, to get the output vector
How does the loss function work in an RNN?
It needs ht and ht-1, which in turn needs ht-2 and so on
When using an RNN for language models, what is the input?
The input is a sequence of L words in vocab V where L is the length of the sequence so far, which are then one-hot encoded and used as a vector of size L x V