DL Part 2 Flashcards
Name the advantages of the LSTM cell/GRU compared to the Elman cell
- ability to capture dependencies of different time scales (LSTM –> )
- control information flow via gates
- additive calculation of state preserves error during backpropagation –> address the vanishing/exploding gradient problem
Why are RNNs suitable for problems with time series?
They contain hidden states as “short memories” to connect between times
What role does the hidden state play in RNNs?
- memory of the network
- model temporal dynamic/dependencies
- context capturing in sequence data
What are the pros and cons of a typical RNN architecture?
(+) * Possibility of processing input of any length
* Model size not increasing with the size of the input
* Computation takes into account historical information
* Weights are shared across time
(-)* Computation is slow
* Difficulty accessing information from a long time ago
* Cannot consider any future input for the current state
What are some RNN basic architectures? Name three applications where many-to-one and one-to-many RNNs would be beneficial.
1-to-1: Classic feed-forward for image classification
1-to-many: image captioning
many-to-1: sentiment analysis
many-to-many: 1. machine translation 2. video classification
What role does the hidden state play in RNNs?
- the memory of the network
- temporal dependencies
- context capturing in sequence data
describe what an element of a batch would be for a recurrent network e.g. by using an example.
an element of a batch represents a sequence of data points.
e.g, in a language modeling task where the input is a sequence of words, an element of a batch would be a sentence or a paragraph.
Why does the required memory space increase with higher batch sizes during training?
- more activation tensors
- larger gradient tensors stored until used to update parameters
- various intermediate calculations
What is the difference between BPTT and TBPTT?
- BPTT: One update requires backpropagation through a complete sequence
- TBPTT: truncated BPTT - keep processing sequence as whole but the input is truncated into manageable fixed-size segments
What are the main challenges of training RNNs?
- maintaining long-term dependencies due to the problem known as vanishing gradients.
- hard to detect long-term dependencies due to the hidden state being overwritten at each time step
- Short-term dependencies work fine
what is the problem with deep RNNs?
- Gradients prone to vanishing or exploding
+ Initial input and weights >0, N is large (deep NN)→ exploding gradients
+ Using activation functions that tend to produce small gradients e.g. sigmoid or tanh → Output is passed through tanh → bounded between -1 and 1 → gradients get smaller after back propagated through tanh → vanishing Gradients
Give several applications where a recurrent neural network can be useful and explain why.
due to their ability to process sequential data and capture temporal dependencies
- NLP e.g language translation, sentiment analysis, text generation, speech recognition
- Time Series Analysis: analyze historical data trends, forecasting future patterns of stock prices/market demand
- speech and audio processing
- image and video analysis
What is the main idea behind LSTMs?
introduction of gates that control writing and accessing “memory” in additional cell state
What is the role of LSTM cell state?
- memory unit that carries information across different time steps,
- allowing the network to remember relevant information from the past and utilize it when needed for making predictions or processing sequential data
How is the update of internal states in LSTM unit?
1) Forget gate: Forgetting old information in the cell state
2) Input gate: Deciding on new input for the cell state
3) Computing the updated cell state
4) Computing the updated hidden state