RNN Flashcards

1
Q

For what kind of problems could we use many to one RNN?

A

Sentence classification, speech recognition (sound waves to one word), video classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

For what kind of problems could we use one to many RNN?

A

Image captioning, music composition, natural text generation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

For what kind of problems could we use many to many RNN?

A

Video classification on frame level, speech enhancment, continues emotion prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What makes RNN different from FW networks?

A

It has feedback.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the formula for updating hidden state and calculating output at a timestep in a simple RNN cell?

A
h_t = tanh( W_{hh} * h_{t-1} + W_{xh} * x_t 
y_t = W_{hy} h_t
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is direct feedback?

A

The hidden state is used in the same cell at the next timestep

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is indirect feedback?

A

The hidden state is connected to a previous cell at the next timestep

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is lateral feedback?

A

A cell is connected to a cell in the same layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does lateral feedback often affect the output of a layer?

A

Cells strenghten themself, while weakening others, the strongest cell becomes active.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a RNN with symetrical connections to all other cells called?

A

A Hopfield network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main challenge of using deep RNN’s

A

Gradient vansihing/ explosion. Batch normalization and dropout layers help.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a biderectional RNN?

A

Cells see inputs both from the past and the future.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a LSTM (Long short-term memory cell)

A

A long short term memory cell has a seperate path for cell state ensuring better gradient propagation?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the forget gate in a LSTM cell?

A

Controls how much of the previous cell state to remember.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the input gate in a LSTM cell?

A

Controlls how much to write to a cell

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the ouput gate in a LSTM cell?

A

Controlls how much to output from the cell

17
Q

How are the values for the different gates calculated in a LSTM cell?

A
i = sigmoid( W_i * [h_{t-1}, x_t])
o = sigmoid( W_o * [h_{t-1}, x_t])
f = sigmoid( W_f * [h_{t-1}, x_t])
g = tanh( W_g * [h_{t-1}, x_t])
c_t = c_{t-e}*f + i*g
h_t = o * tanh(c_t)
18
Q

How do the weight matricies in RNN differ for different timesteps?

A

They are the same for all timesteps.

19
Q

Why is the gradient flow improved in LSTM?

A

No matrix multiplication of cell state.

20
Q

What are peephole connections in LSTM?

A

c_{t-1} is connected to the forget, input and output gates

21
Q

What is a GRU ( Gated reccurent unit)?

A

Compared to LSTM, cell state is eliminated and uses only two gates, reset and update. The hidden state path still avoids matrix multipliation to allow efficient matrix multiplication.

22
Q

What is the main advantage of GRU compared to LSTM?

A

GRU gates have fewer parameters and often perform comparable to LSTM.

23
Q

What is pooling over time?

A

The output can be a average, max, sum… over time of the output of the individual cells. (For … to one problems)

24
Q

How can we solve problems with a high input dimension at each timestep and going over several timesteps?

A

CNN + RNN

25
Q

How can the vanishing/exploding gradient in a basic RNN be leviated?

A

Exploding: Clipping gradient
Vanishing: Changing to LSTM (Or Gru)

26
Q

What is BPTT(Back propagation trough time)?

A

In BPTT we run the forward phase for the entire sequence (over time) without updating the weights. We then calculate the loss of all outputs of the RNN and let the gradient propagate back to eariler states (trough time).

27
Q

What is sequence representation learning with RNNs?

A

Use a RNN “encoder” to get a single output form a sequence, and a RNN decoder to turn the ouput into a sequence. Can also use “attention” a combination of the ouputs from each cell in the encoder that is used as additional input to each cell in the decoder.

28
Q

What is the moativation behind CTC ( Connectionist Temporal Classification)?

A

If we want to translate audio into e.g. text with a normal RNN we need to