RNNs Flashcards by Ben Boyce

How do Hidden Markov Models input words?

They do it one at a time

How well did you know this?

Not at all

Perfectly

What are some limits around state based models?

Supervised ML techniques take fixed sequence inputs, but sentences can vary in length

How well did you know this?

Not at all

Perfectly

What is the workaround to sentences not having a fixed length?

We use a sliding window of words

How well did you know this?

Not at all

Perfectly

What is a problem using a sliding window of words?

It is hard to learn semantic patterns due to long range dependencies

How well did you know this?

Not at all

Perfectly

Why can a single sentence generate lots of inputs?

This is because of the sliding window - if we have the sentence “and thanks for all the fish” and a window size of 3, we can have inputs of “and thanks for”, “thanks for all”, “for all the” and “all the fish”.

How well did you know this?

Not at all

Perfectly

In the image, what is the size of the input?

It is 3 times dimension of the embedding, as we have three embeddings that are all concatenated together

How well did you know this?

Not at all

Perfectly

What are Recurrent Neural Networks based on?

Elman Networks

How well did you know this?

Not at all

Perfectly

What is different about RNNs compared to NNs?

We don’t just take the immediate input, we also factor in the previous input as well.

How well did you know this?

Not at all

Perfectly

What happens to the values of the hidden layer at time t-1 when an input is received at time t?

The values are provided as input in addition to the current input vector

How well did you know this?

Not at all

Perfectly

What type of network does the image show?

A simple RNN

How well did you know this?

Not at all

Perfectly

Explain what the image shows?

It shows how an RNN works, we have an input vector, which is adjusted by the weights, w, and the hidden layer from the previous input. We aggregate these with the current weights w to get the new value for the hidden layer values. These are multiplied by the weights for the output which are then output.

How well did you know this?

Not at all

Perfectly

How are the hidden layer values computed?

An activation function is used

How well did you know this?

Not at all

Perfectly

Explain what the image shows

It shows that to get the hidden layer values, h_t, we multiply the previous hidden layer by weights U, and add the current input multiplied by weights W. Then to get the output, we use a function f, usually softmax, to get the output vector

How well did you know this?

Not at all

Perfectly

How does the loss function work in an RNN?

It needs h_t and h_t-1, which in turn needs h_t-2 and so on

How well did you know this?

Not at all

Perfectly

When using an RNN for language models, what is the input?

The input is a sequence of L words in vocab V where L is the length of the sequence so far, which are then one-hot encoded and used as a vector of size L x V

How well did you know this?

Not at all

Perfectly

What is a one-hot vector in regards to a language model?

Study These Flashcards

It is a vector of size V that is filled with 0s apart from the index where that word appears in the vocabulary

When using a RNN with a language model, what is the output Y?

Study These Flashcards

It is the predicted next word in the sequence, which is a probability distribution over V

What does cross-entropy measure?

Study These Flashcards

It measures how well a set of estimated probabilities matches the target class

What is teacher forcing?

Study These Flashcards

It uses the output from prior training steps as input to help model convergence, so when making the next prediction, use the ground truth sequence rather than the predicted values to ensure that training keeps on track

How does sequence labelling with RNNs work (e.g. POS tagging)?

Study These Flashcards

The input X is a sequence of words

The output Y is POS tag probabilities (most likely chosen by argmax)

Pre-trained word embeddings can be used

The loss function is a cross-entropy loss function

How does autoregressive generation using an RNN work (e.g. text generator)?

Study These Flashcards

Input X is a sequence of words so far, starting with start token

Output Y is the next word to be added to X

Pre-trained word embeddings can be used

The loss function is a cross entropy loss function

How does sequence classification work with an RNN (e.g. sentence/document classifier)?

Study These Flashcards

Input X is a sequence of words in sentence/document

Output Y is a class probability

Use of both an RNN + MLP

Cross entropy loss function based on classification result

How do stacked RNNs work?

Study These Flashcards

The entire output sequence of one RNN is used as an input for another RNN

What is a positive of using a stacked RNN?

Study These Flashcards

It encodes different levels of abstract representations which allows for more sophisticated patters to be encoded

What is a drawback of using a stacked RNN?

Adding more RNN layers increases training time

What are stacked RNNs an example of?

Deep Learning

How does a bi-directional RNN work?

We have an RNN layer that does a forward pass, a separate RNN layer that does a backwards pass, and then we concatenate the hidden layer values for each position t in the sequence

In an RNN, how many sets of weights do we have to update?

3 - W, the weights from the input layer to the hidden layer, U, the weights from the previous hidden layer to the current hidden layer and V, the weights from the hidden layer to the output layer

RNNs Flashcards

(28 cards)