Lecture 10 Flashcards
Recurrent Neural Network (RNN) and the Vanishing Gradient
Recurrent Neural Network (RNN)
A Recurrent Neural Network (RNN) is a type of artificial neural network that is used for processing sequential data such as time series, speech, and text. Unlike feedforward neural networks, RNNs have loops that allow information to persist. This makes them well-suited for tasks that require context or memory, such as language modeling, machine translation, and speech recognition. RNNs can be trained using backpropagation through time (BPTT), which is a variant of the backpropagation algorithm that is used to train feedforward neural networks
Limitations:
Only accepting a fixed-size vector as input and produce a fixed-size vector as output (e.g., probabilities of different classes).
* Use a fixed amount of computational steps (e.g. the number of layers in the
model).
Recurrent Neural Networks
Recurrent Neural Networks are networks with loops, allowing information to persist.
Formula
ht = fw(ht-1,xt)
h(sub t) = function(w)(h(sub t-1), x sub t)
What about parameters for Dense layer?
Output: y dimension
Hidden state dimension: h
Bias: y dimension
Parameters = shape (y) * shape (h)+ shape (y)
Backpropagation in RNNs
A recurrent neural network can be imagined as multiple copies of the same network, each passing a message to a successor. The diagram above provides a
schematic view of what happens if we could unroll the loop.
Vanishing Gradient Problem
Words from time steps far away are not as influential as they should be any more
Example:
Michael and Jane met last Saturday. It was a nice sunny day when they saw each other in the park. Michael just saw the doctor two weeks ago. Jane came back from Norway last Monday. Jane offered her best wish to _________.
Networks with Memory
- Vanilla RNN operates in a “multiplicative” way (repeated
tanh or sigmoid activation) to remember previous inputs - This can work OK if we only need short term memory
- Using RELU can alleviate the VG problem (derivative = 1)
Networks with Memory
To extend memory beyond the short term:
* Long Short-Term Memory (LSTM) (Hochreiter and
Schmidhuber, 1997)
* Gated Recurrent Unit (GRU) (Cho et al. 2014)
* Both designs process information in an “additive” way with
gates to control information flow.
Text Generation with RNNs
Text generation is a natural candidate for sequential learning: “Based on what was said
before, what’s the next thing that will be (or should be) said?” Because RNNs are good for
using variable length, sequential inputs to predict the output, they are well suited to text
generation tasks where initial “seed” text is used to generate new text.
Different Varieties of Sequence Modeling
Input: Scalar
Output: Scalar
“standard”
classification /
regression
problems: this
is not sequence
modeling
Different Varieties of Sequence Modeling
Input: Scalar
Output: Sequence
Example: Image
to text; question
answering; skip-
gram analysis
Different Varieties of Sequence Modeling
Input: Sequence
Output: Scalar
Example: sentence
classification,
multiple-choice
question answering
Different Varieties of Sequence Modeling
Input: Sequence
Output: Sequence
Example: machine translation, video captioning, open-ended question answering, video question answering
Bigram Language Model vs. RNN
- Practical bigram language models
require the simplifying Markov
assumption: prediction of next
token is only dependent on the last
predicted token - Probability of a sequence Y is
simply a chain of p(y2|y1)p(y1) - Long range dependencies are lost
- In contrast, an RNN conditions
each prediction on the current
input and the entirety of the
foregoing sequence:
Keras RNN Modules
This week we have examined the vanilla RNN, which is implemented in Keras as the SimpleRNN class. Although this vanilla RNN is rarely used in production, for some problems it can produce comparable results to LSTM and GRU while training fewer weights.
Keras Models Including Today’s Class
- Recurrent networks are the third
type of supervised deep learning
model we have created with Keras
so far - The models all show the same
basic underlying architectural
approach, with enhancement to
support convolution and
recurrence - Today’s class covered “vanilla”
RNNs and next week we will
consider LSTMs and GRUs
Simple RNN (From Keras Doc)
Key Features
* Vanilla RNN, susceptible to the vanishing or exploding gradient problem as previously
discussed
* Uses a for loop to iterate over the timesteps of a sequence, while maintaining internal state that
encodes information about timesteps it has seen so far.
* Ability to process an input sequence in reverse, via the go backwards argument
* Loop unrolling (which can lead to a large speedup when processing short sequences on CPU), via
the unroll argument
* By default, the output of a RNN layer contains a single vector per sample. This vector is the RNN
cell output corresponding to the last timestep, containing information about the entire input
sequence.
Simple RNN (From Keras Doc)
Key Arguments
* units: Positive integer, dimensionality of the output
space.
* activation: Activation function to use. Default: hyperbolic tangent (tanh). If you pass None, no
activation is applied (ie. “linear” activation: a(x) = x).
* use bias: Boolean, (default True), whether the layer uses a bias vector.
* kernel initializer: Initializer for the kernel weights matrix, used for the linear transformation of the
inputs. Default: glorot_uniform.
* recurrent initializer: Initializer for
the recurrent kernel weights matrix, used for the linear transformation of the recurrent state. Default: orthogonal.
Keras RNN Architectures
- Having selected a recurrent
approach, there are still a variety of decisions to be made about
configuring a model - This lab demonstrates
comparisons between a few of
these options on a prediction task - The diagnostics demonstrated here are good to include in your own project as a way of demonstrating that the model configuration fits what you intended to do