Handout #8 - Recurrent Neural Networks Flashcards
Explain what can be used to train a RNN
Back-Propogation Through Time (BPTT)
- Unroll the network to expand it into a standard feedforward network and then apply back-propogation as per usual.
What’s the problem with BPTT
The unrolled network can grow very large and might be hard to fit into the GPU memory.
Process is seq. -> can’t be parallelised.
What’s the problem with the Simple RNN layer
RNN can grow very deep -> gradient descent can vanish (or explode) very quickly.
What time of data is RNN’s used for?
It’s used on sequential data -> any data with tie series (e.g. audio signal, stock market, machine translation)
Is an RNN a feedforward network?
Not, it’s cyclic
Explain why LSTM is useful
Deals with the exploding and vanishing gradient problem (when unrolling the network).
LSTM has three gates; forget gate, input gate and output gate.
- Forget gate; forget irrelevant information
- Input gate; add/update new information
- Output gate; pass updated information
Explain GRU
A simpler alternative to LSTM -> faster to train.
Instead of linear combination (w1u1 + w2u2), the gating mechanism is based on a multiplication of both inputs.
What is the critical issue with RNN
They aren’t suitable for transfer learning.
Can’t do stuff in parallel.