Lecture 11 Flashcards

Question 1

Q

Long Short-Term Memory (LSTM)

Answer

A

Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) that is designed to handle sequential data such as time series, speech, and text. It is composed of a cell, an input gate, an output gate, and a forget gate. The cell remembers values over arbitrary time intervals, and the three gates regulate the flow of information into and out of the cell. LSTM networks are capable of processing data sequentially and keeping their hidden state through time. They are applicable to various tasks such as classification, speech recognition, machine translation, and healthcare.

Question 2

Q

Feedforward

Answer

A

Simple, unidirectional predictive
structures connecting input arrays
to output arrays

Question 3

Q

Convolutional

Answer

A

Sliding window moving across
time or multi-dimensional structures to capture features

Question 4

Q

Recurring

Answer

A

Neurons with feedback loops creating memory structures
with limited persistence

Question 5

Q

Gated

Answer

A

Cell units containing multiple
neurons and providing long term
memory

Question 6

Q

Backpropagation in RNNs

Answer

A

A recurrent neural network can be imagined as multiple copies of the same network, each passing a message to a successor.

Question 7

Q

Vanishing Gradient Problem

Answer

A

Words from time steps far away are not as influential as they should be any more

Question 8

Q

Forget gate:

Answer

A

how much information from the previous time step will
be kept?

Question 9

Q

Input gate:

Answer

A

which values will be updated and the new candidate values
Sigmoid function: outputs a number between 0 and 1

Question 10

Q

Tanh function

Answer

A

(hyperbolic tangent function): outputs a number between -1 and 1

Question 11

Q

Cell state:

Answer

A

Cell state: Update the old cell state, Ct-1, into the new cell state Ct.
* The new cell state 𝐶! is comprised of information from the past 𝑓! ∗ 𝐶!”# and valuable new information

Question 12

Q

elementwise multiplication

Answer

A

8 0 0 |
| 3 1 3 |
| 2 0.5 1 |
| 4 1 4 |
| 2 4 8 |

Question 13

Q

Based on the cell state, we will decide what the output will be

Answer

A

tanh function filters the new cell state to characterize stored information
Significant information in 𝐶t -> ±1
Minor details -> 0
ℎt serves as a hidden state for the next time step

Question 14

Q

Gated Recurrent Units (GRU)

Answer

A

In 2014, Cho and his colleagues posted a paper entitled, “Learning
phrase representations using RNN encoder-decoder for statistical
machine translation.” In this paper, the researchers introduced a
simplified LSTM model, which later became referred to as a GRU.
They evaluated their approach on the English/French translation task
of the WMT’14 workshop. In later papers, the GRU has often
performed as well as LSTM, even though it is simpler.

Question 15

Q

Gated Recurrent Unit (GRU)

Answer

A

GRU is a variation of LSTM that also adopts the gated design.
* Differences:
* GRU uses an update gate 𝒛 to substitute the input and forget gates
* Combines the cell state 𝐶! and hidden state ℎ! in LSTM as a single cell state ℎ!
* GRU obtains similar performance compared to LSTM with fewer parameters and
faster convergence. (Cho et al. 2014)

Question 16

Q

Update gate:

Answer

A

controls the composition of the new state

Question 17

Q

Reset gate:

Answer

A

determines how much old information is needed
in the alternative state ℎ#!

Question 18

Q

Alternative state:

Answer

A

contains new information

Question 19

Q

New state:

Answer

A

replace selected old information with new information in the new state

Question 20

Q

Text summarization using LSTM-CNN Song et al., 2018, Multimedia Tools & Apps

Answer

A

Abstractive Text Summarization
generates readable summaries
without being constrained to
phrases from the original text
Training data: human generated
abstractive summary bullets from
CNN and DailyMail stories
ROUGE (Recall-Oriented
Understudy for Gisting Evaluation)
toolkit was used for evaluation
LSTM-CNN outperformed four
previous models by 1-4%

Question 21

Q

Extracting Temporal Relations from Korean Text
Lim & Choi, 2018, IEEE Big Data/Smart Computing

Answer

A

From the article: “difficult to
correctly recognize the temporal
relations from Korean text owing to the inherent linguistic
characteristics of the Korean
language”
Dataset: Korean TimeBank - 2393
annotated documents and 6190
Korean sentences
F1 scores ranged from 0.46 to 0.90 on various temporal relations

Question 22

Q

Emotion Recognition in Online Comments (Li & Xiao, 2020)

Answer

A

This model consists of
an embedding layer
a bidirectional LSTM
layer
a feedforward
attention layer
a concatenation layer
an output layer
Training data: Emotion
labelled twitter data and
blog data
F-1 measure: 62.78%

Question 23

Q

LSTM
Key Features

Answer

A

Long Short-Term Memory layer - Hochreiter 1997.
Based on available runtime hardware and constraints, this layer will choose different
implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer
meet the requirement of the CuDNN kernel (see below for details), the layer will use a fast cuDNN implementation.
When processing very long sequences (possibly infinite), you may want to use the pattern of cross-batch statefulness.
Normally, the internal state of a RNN layer is reset every time it sees a new batch (i.e. every sample
seen by the layer is assumed to be independent of the past). The layer will only maintain a state while
processing a given sample.

Question 24

Q

LSTM
Key Arguments

Answer

A

units: Positive integer, dimensionality of the output space.
activation: Activation function to use. Default: hyperbolic tangent (tanh). If you pass None, no
activation is applied (ie. “linear” activation: a(x) = x).
recurrent_activation: Activation function to use for the recurrent step. Default: sigmoid (sigmoid). If
you pass None, no activation is applied (ie. “linear” activation: a(x) = x).
kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation of the
inputs. Default: glorot_uniform.
unit_forget_bias: Boolean (default True). If True, add 1 to the bias of the forget gate at initialization.
Setting it to true will also
force bias_initializer=”zeros”

Question 25

Q

GRU
Key Features

Answer

A

Gated Recurrent Unit based on Cho et al (2014).
There are two variants of the GRU
implementation. The default one is based on v3 and has reset gate applied to hidden state
before matrix multiplication. The other one is based on original and has the order reversed.
The second variant is compatible with CuDNNGRU (GPU-only) and allows inference on CPU. Thus it
has separate biases for kernel and
recurrent_kernel. To use this variant, set ‘reset_after’=True and
recurrent_activation=’sigmoid’.
In TensorFlow 2.0, the built-in LSTM and GRU layers have been updated to leverage CuDNN
kernels by default when a GPU is available. With this change, the prior layers have been deprecated, and you can build your model without worrying about the hardware it will run on.

Question 26

Q

GRU
Key Arguments

Answer

A

units: Positive integer, dimensionality of the output space.
activation: Activation function to use. Default: hyperbolic tangent (tanh). If you pass None, no
activation is applied (ie. “linear” activation: a(x) = x).
recurrent activation: Activation function to use for the recurrent step. Default: sigmoid (sigmoid). If
you pass None, no activation is applied (ie. “linear” activation: a(x) = x).
kernel initializer: Initializer for the kernel weights matrix, used for the linear transformation of the
inputs. Default: glorot_uniform.
unit_forget_bias: Boolean (default True). If True, add 1 to the bias of the forget gate at initialization.
Setting it to true will also
force bias initializer=”zeros”. This is recommended in Jozefowicz et al..

Question 27

Q

Lecture 11 Flashcards

Long Short-Term Memory and Gated Recurrent Units for NLP