LSTM Flashcards

1
Q

Define LSTM

A

Long Short Term Memory

It is a more complex form of Recurrent unit, instead of just adding the effects of input and the past, LSTM have gating units that can turn these effects on or off based on the input.

These gates have their own parameters that are trained during backpropagation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three different gates?

A
  1. Forget Gate - Removes from cell state
  2. Input Gate - Adds to cell state
  3. Output Gate - Calculate hidden state

Forget gate and input gate together updates cell state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the hyper parameter in LSTM?

A

No. of node of Neural network present in each gate.

Dimension of Cell state, hidden state, and all the state of gates are same and is equal to no. of node of neural network.

However, dimension of input may be different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why ft*ct-1 is called as forget step?

A

Since, we are using sigmoid [0,1] in output layer of ft, therefore the final output (ft*ct-1) is controlled and vary between [0,ct-1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is ct (cell state) calculated?

A

ct = ftct-1 + it
Element wise addition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is ht (hidden state) calculated?

A

ht = ot * tanh(ct)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is GRU and why was it needed?

A

Gated Recurrent Unit is a simpler version of LSTM with fewer gates and less computation.
It was needed because of complex LSTM with higher number of parameters increasing its complexity.

GRU has two gates: -
* Reset gate
* Update gate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between LSTM and GRU?

A

LSTM: Three gates - forget, input, output
GRU: Two gates - reset, update

LSTM: Two states - cell & hidden
GRU: One state - Hidden

LSTM has more parameter than GRU.

LSTM are computationally expensive due to extra gate and cell state

GRU is preferred for simpler task.
LSTM is preferred for complex task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When to use deep RNNs?

A

Stacking multiple RNNs on top of one another.

  1. For advanced tasks like speech recognition, machine translation
  2. When we have large dataset
  3. If we have enough computational power

Deep RNN maintains hierarchical structure; initial RNN ensures word level dependency, as we go deep, sentence, paragraph level dependency are captured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is BiRNN/BiLSTM/BiGRU

A

Bi-Directional
Used when context of a word depends on future words.
Ex - I love amazon, website; I love amazon, river.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Application BiRNN/BiLSTM/BiGRU

A

NER, POS tagging, Machine Translation, Sentiment Analysis, Time Series Forecasting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Disadvantage of BiRNN/BiLSTM/BiGRU

A
  1. Overfitting (due to increased complexity)
  2. Latency issues in real-time speech recognition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly