Lecture 10 Flashcards
Recurrent Neural Network (RNN) and the Vanishing Gradient
Recurrent Neural Network (RNN)
A Recurrent Neural Network (RNN) is a type of artificial neural network that is used for processing sequential data such as time series, speech, and text. Unlike feedforward neural networks, RNNs have loops that allow information to persist. This makes them well-suited for tasks that require context or memory, such as language modeling, machine translation, and speech recognition. RNNs can be trained using backpropagation through time (BPTT), which is a variant of the backpropagation algorithm that is used to train feedforward neural networks
Limitations:
Only accepting a fixed-size vector as input and produce a fixed-size vector as output (e.g., probabilities of different classes).
* Use a fixed amount of computational steps (e.g. the number of layers in the
model).
Recurrent Neural Networks
Recurrent Neural Networks are networks with loops, allowing information to persist.
Formula
ht = fw(ht-1,xt)
h(sub t) = function(w)(h(sub t-1), x sub t)
What about parameters for Dense layer?
Output: y dimension
Hidden state dimension: h
Bias: y dimension
Parameters = shape (y) * shape (h)+ shape (y)
Backpropagation in RNNs
A recurrent neural network can be imagined as multiple copies of the same network, each passing a message to a successor. The diagram above provides a
schematic view of what happens if we could unroll the loop.
Vanishing Gradient Problem
Words from time steps far away are not as influential as they should be any more
Example:
Michael and Jane met last Saturday. It was a nice sunny day when they saw each other in the park. Michael just saw the doctor two weeks ago. Jane came back from Norway last Monday. Jane offered her best wish to _________.
Networks with Memory
- Vanilla RNN operates in a “multiplicative” way (repeated
tanh or sigmoid activation) to remember previous inputs - This can work OK if we only need short term memory
- Using RELU can alleviate the VG problem (derivative = 1)
Networks with Memory
To extend memory beyond the short term:
* Long Short-Term Memory (LSTM) (Hochreiter and
Schmidhuber, 1997)
* Gated Recurrent Unit (GRU) (Cho et al. 2014)
* Both designs process information in an “additive” way with
gates to control information flow.
Text Generation with RNNs
Text generation is a natural candidate for sequential learning: “Based on what was said
before, what’s the next thing that will be (or should be) said?” Because RNNs are good for
using variable length, sequential inputs to predict the output, they are well suited to text
generation tasks where initial “seed” text is used to generate new text.
Different Varieties of Sequence Modeling
Input: Scalar
Output: Scalar
“standard”
classification /
regression
problems: this
is not sequence
modeling
Different Varieties of Sequence Modeling
Input: Scalar
Output: Sequence
Example: Image
to text; question
answering; skip-
gram analysis
Different Varieties of Sequence Modeling
Input: Sequence
Output: Scalar
Example: sentence
classification,
multiple-choice
question answering
Different Varieties of Sequence Modeling
Input: Sequence
Output: Sequence
Example: machine translation, video captioning, open-ended question answering, video question answering
Bigram Language Model vs. RNN
- Practical bigram language models
require the simplifying Markov
assumption: prediction of next
token is only dependent on the last
predicted token - Probability of a sequence Y is
simply a chain of p(y2|y1)p(y1) - Long range dependencies are lost
- In contrast, an RNN conditions
each prediction on the current
input and the entirety of the
foregoing sequence: