RNN Flashcards by Patrick Henriksen

For what kind of problems could we use many to one RNN?

Sentence classification, speech recognition (sound waves to one word), video classification

How well did you know this?

Not at all

Perfectly

For what kind of problems could we use one to many RNN?

Image captioning, music composition, natural text generation.

How well did you know this?

Not at all

Perfectly

For what kind of problems could we use many to many RNN?

Video classification on frame level, speech enhancment, continues emotion prediction

How well did you know this?

Not at all

Perfectly

What makes RNN different from FW networks?

It has feedback.

How well did you know this?

Not at all

Perfectly

What is the formula for updating hidden state and calculating output at a timestep in a simple RNN cell?

h_t = tanh( W_{hh} * h_{t-1} + W_{xh} * x_t 
y_t = W_{hy} h_t

How well did you know this?

Not at all

Perfectly

What is direct feedback?

The hidden state is used in the same cell at the next timestep

How well did you know this?

Not at all

Perfectly

What is indirect feedback?

The hidden state is connected to a previous cell at the next timestep

How well did you know this?

Not at all

Perfectly

What is lateral feedback?

A cell is connected to a cell in the same layer.

How well did you know this?

Not at all

Perfectly

How does lateral feedback often affect the output of a layer?

Cells strenghten themself, while weakening others, the strongest cell becomes active.

How well did you know this?

Not at all

Perfectly

What is a RNN with symetrical connections to all other cells called?

A Hopfield network

How well did you know this?

Not at all

Perfectly

What is the main challenge of using deep RNN’s

Gradient vansihing/ explosion. Batch normalization and dropout layers help.

How well did you know this?

Not at all

Perfectly

What is a biderectional RNN?

Cells see inputs both from the past and the future.

How well did you know this?

Not at all

Perfectly

What is a LSTM (Long short-term memory cell)

A long short term memory cell has a seperate path for cell state ensuring better gradient propagation?

How well did you know this?

Not at all

Perfectly

What is the forget gate in a LSTM cell?

Controls how much of the previous cell state to remember.

How well did you know this?

Not at all

Perfectly

What is the input gate in a LSTM cell?

Controlls how much to write to a cell

How well did you know this?

Not at all

Perfectly

What is the ouput gate in a LSTM cell?

Study These Flashcards

Controlls how much to output from the cell

How are the values for the different gates calculated in a LSTM cell?

Study These Flashcards

i = sigmoid( W_i * [h_{t-1}, x_t])
o = sigmoid( W_o * [h_{t-1}, x_t])
f = sigmoid( W_f * [h_{t-1}, x_t])
g = tanh( W_g * [h_{t-1}, x_t])

c_t = c_{t-e}*f + i*g
h_t = o * tanh(c_t)

How do the weight matricies in RNN differ for different timesteps?

Study These Flashcards

They are the same for all timesteps.

Why is the gradient flow improved in LSTM?

Study These Flashcards

No matrix multiplication of cell state.

What are peephole connections in LSTM?

Study These Flashcards

c_{t-1} is connected to the forget, input and output gates

What is a GRU ( Gated reccurent unit)?

Study These Flashcards

Compared to LSTM, cell state is eliminated and uses only two gates, reset and update. The hidden state path still avoids matrix multipliation to allow efficient matrix multiplication.

What is the main advantage of GRU compared to LSTM?

Study These Flashcards

GRU gates have fewer parameters and often perform comparable to LSTM.

What is pooling over time?

Study These Flashcards

The output can be a average, max, sum… over time of the output of the individual cells. (For … to one problems)

How can we solve problems with a high input dimension at each timestep and going over several timesteps?

Study These Flashcards

CNN + RNN

How can the vanishing/exploding gradient in a basic RNN be leviated?

Exploding: Clipping gradient Vanishing: Changing to LSTM (Or Gru)

What is BPTT(Back propagation trough time)?

In BPTT we run the forward phase for the entire sequence (over time) without updating the weights. We then calculate the loss of all outputs of the RNN and let the gradient propagate back to eariler states (trough time).

What is sequence representation learning with RNNs?

Use a RNN "encoder" to get a single output form a sequence, and a RNN decoder to turn the ouput into a sequence. Can also use "attention" a combination of the ouputs from each cell in the encoder that is used as additional input to each cell in the decoder.

What is the moativation behind CTC ( Connectionist Temporal Classification)?

If we want to translate audio into e.g. text with a normal RNN we need to

RNN Flashcards

(28 cards)