RNNS Flashcards
What are Recurrent Neural Networks?
RNNS are neural networks designed for processing sequential data.
Application areas for RNN?
RNNs are widely used in NLP, speech recognition, time series prediction, and other tasks where the input or output data has a temporal relationship.
How does an RNN work?
RNN can be unfolded over time, at each time step, the recurrent unit takes an input along with the hidden state from the previous time step, processes them, and produces an output and a new hidden state.
Advantages of RNN
1) Ability to Process Sequential Data
2) Memory over time
3) Flexibility in Input and Output Length
4) Summary of the sequence so far is available is encoded as context
5)
Disadvantages of RNN
1) Difficulty in Capturing Long-Term Dependencies due to vanishing gradient problem
3)Computational Complexity for long sequences and large-scale datasets, as they require processing each time step sequentially.
4) Vanishing and Exploding Gradients leads to instability in optimization
5) Fixed-Length Representations might restrict their ability to capture long-term dependencies in sequences
What is vanishing gradient problem?
The vanishing gradient problem occurs when the gradients of the loss function with respect to the parameters become very small as they propagate backward through the layers of the network during training.
When gradients become very small, the network parameters are updated slowly or not at all, leading to slow convergence or stagnation in learning.
What is exploding gradient and what might cause it?
Opposite of vanishing gradient problem.
This can happen due to exploding activations, unstable weight initialization, or high learning rates.
What happens as a result of exploding gradients?
Drastic updates in model parameters leading to oscillation and non-convergence.
Numerical-overflow
Mitigation strategies for Vanishing/Exploding Gradients?
Batch Normalisation, Proper Initialization, Reducing learning rate, Change your architecture, Gradient clipping(Exploding), Reducing model complexity
What is bidirectional RNNS?
Bi-RNNs consist of two RNNs, one processing the input sequence forward in time and the other backward.
What is LSTM?
Long Short Term Memory.
Why was LSTM introduced?
To address the vanishing gradient problem, and to capture long range dependencies
What is the idea of LSTM?
It addresses the vanishing gradient problem by introducing a memory cell with several gating mechanisms that control the flow of information
What are the components of an LSTM unit?
Cell State and Hidden state
Gates:
Input Gates: controls the flow of information into the memory cell.
Forget Gate: flow of information out of the memory cell
Output gate: the flow of information out of the LSTM and into the output.
Describe how gates work?
The input gate decides which information to store in the memory cell. It is trained to open when the input is important and close when it is not.
The forget gate decides which information to discard from the memory cell. It is trained to open when the information is no longer important and close when it is.