Lecture 11 Flashcards
Long Short-Term Memory and Gated Recurrent Units for NLP
Long Short-Term Memory (LSTM)
Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) that is designed to handle sequential data such as time series, speech, and text. It is composed of a cell, an input gate, an output gate, and a forget gate. The cell remembers values over arbitrary time intervals, and the three gates regulate the flow of information into and out of the cell. LSTM networks are capable of processing data sequentially and keeping their hidden state through time. They are applicable to various tasks such as classification, speech recognition, machine translation, and healthcare.
Feedforward
Simple, unidirectional predictive
structures connecting input arrays
to output arrays
Convolutional
Sliding window moving across
time or multi-dimensional structures to capture features
Recurring
Neurons with feedback loops creating memory structures
with limited persistence
Gated
Cell units containing multiple
neurons and providing long term
memory
Backpropagation in RNNs
A recurrent neural network can be imagined as multiple copies of the same network, each passing a message to a successor.
Vanishing Gradient Problem
Words from time steps far away are not as influential as they should be any more
Forget gate:
how much information from the previous time step will
be kept?
Input gate:
which values will be updated and the new candidate values
Sigmoid function: outputs a number between 0 and 1
Tanh function
(hyperbolic tangent function): outputs a number between -1 and 1
Cell state:
Cell state: Update the old cell state, Ct-1, into the new cell state Ct.
* The new cell state 𝐶! is comprised of information from the past 𝑓! ∗ 𝐶!”# and valuable new information
elementwise multiplication
8 0 0 |
| 3 1 3 |
| 2 0.5 1 |
| 4 1 4 |
| 2 4 8 |
Based on the cell state, we will decide what the output will be
- tanh function filters the new cell state to characterize stored information
- Significant information in 𝐶t -> ±1
- Minor details -> 0
- ℎt serves as a hidden state for the next time step
Gated Recurrent Units (GRU)
In 2014, Cho and his colleagues posted a paper entitled, “Learning
phrase representations using RNN encoder-decoder for statistical
machine translation.” In this paper, the researchers introduced a
simplified LSTM model, which later became referred to as a GRU.
They evaluated their approach on the English/French translation task
of the WMT’14 workshop. In later papers, the GRU has often
performed as well as LSTM, even though it is simpler.
Gated Recurrent Unit (GRU)
GRU is a variation of LSTM that also adopts the gated design.
* Differences:
* GRU uses an update gate 𝒛 to substitute the input and forget gates
* Combines the cell state 𝐶! and hidden state ℎ! in LSTM as a single cell state ℎ!
* GRU obtains similar performance compared to LSTM with fewer parameters and
faster convergence. (Cho et al. 2014)