Sequence Models Flashcards

Notes on Sequence Models that may help with the exam.

1
Q

What is a one-line summary of Sequential Learning?

A

One data item is dependent on those that come before or after it, not independently and identically distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some common applications of Sequence Learning?

A

Speech/Voice Recognition
Weather forecasting
Language translation
DNA Sequence Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the four types of application scenarios in regards to Recurrent Neural Networks?

A

One-to-one
One-to-many
Many-to-one
Many-to-many

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a one-line summary of One-to-one in the context of Recurrent Neural Networks?

A

Classical feed forward neural network with one input and one output e.g. image classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a one-line summary of One-to-many in the context of Recurrent Neural Networks?

A

Input image and output words with variable length e.g. Image Captioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some common applications of Many-to-one in the context of Recurrent Neural Networks?

A

Sentiment Classification
Share Price Predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some common applications of Many-to-many in the context of Recurrent Neural Networks?

A

Language Translation - Input and output are with variable length

Video Clip Classification - Input and output have the same length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the equation for a Basic Recurrent Neural Network unit?

A

Current state = activation function * (Previous state and the input vector at present time step)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What challenge do Recurrent Neural Networks face?

A

It suffers from the vanishing/exploding gradient problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are Long Short Term Memory Networks?

A

They are altered variations of Recurrent Neural Networks, which are specifically designed to capture long term dependencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are Long Short Term Memory networks better at doing compared to Recurrent Neural Networks?

A

They are better at back propagating the gradient much more efficiently than the standard Recurrent Neural Network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a Long Short Term Memory consist of in regards to its two main components?

A

Long term memory cell states, which are non-learnable
Short term memory hidden states, which are learnable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What connects the two main blocks of a Long Short Term Memory Network, and determines whether information passes through to either side?

A

Multiple sigmoid functions act as gates that switch on and off the information passing areas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What three types of gates exist within the Long Short Term Memory Network Unit?

A

Forget Gate
Input Gate
Output Gate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the Forget Gate determine in LSTMs?

A

The forget gate determines how much long term memory is retained (% amount)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the Input Gate determine in LSTMs?

A

The Input Gate determines how much of the short-term information should be contributed to the long-term memory (% amount)

17
Q

What does the Output Gate determine in LSTMs?

A

The Output Gate determines how much of the new long-term memory should be contributed to the current output. (% amount)

18
Q

What is the challenge in regards to LSTMs?

A

LSTMs are able to capture long term memory, but not for long enough for it to be useful

19
Q

What is a Transformer especially good at?

A

Transformers are able to capture dependencies between each input word and between all input words to outputs in a sentence

20
Q

What is the overall architecture of a Transformer?

A

It is modelled as an Encoder-Decoder architecture:
Encoder - The output is a continuous vector representation of the inputs
Decoder - Takes the continuous vector from the Encoder and generates an output one by one

21
Q

What are some common applications of Transformers?

A

Language Translation
Chatbot

22
Q

What is the overall pipeline for a Transformer?

A

Starts with two chains:
- Encoder self-attention -> Feed-forward network
- Decoder self-attention

These pipelines combine into a single pipeline, which is as follows:
- Decoder-Encoder Attention -> Feed-Forward Network

23
Q

What are the details surrounding the Input Embedding step in Transformers?

A

Each word is mapped to a vector
We then add a positional encoding to the vector using sine/cosine functions

24
Q

What are the details surrounding the Encoder Training area of the Transformer?

A

The multi-head attention calculates the relationship between all pairwise inputs
The Feed-Forward network maps the attention vectors so that something can be fed to the Transformer decoder.

25
Q

What are the details surrounding the Multi-Head Attention area of the Transformer?

A

Input - Each word has a Query, Key and Value assigned to it
Scaled Dot-Product Attention is then fed into the Scale, which is fed into the Softmax function.

For each input, compute multiple attention vectors and use the weighted average as the final attention vector for each word.

Output the vector as the Dot Product between attention weight and value. Multiple words can be processed in parallel.

26
Q

What are the details surrounding the Decoder Training area of the Transformer?

A

During training, the target sentence is input to the Masked Multi-Head Attention, which masks out relationships to future words.

Then another Multi-Head Attention learns the interactions between input words and target words

The output is one-hot encoding, such as a 1000-word vector

27
Q

What does Model Inference mean in regards to the Transformer?

A

To estimate the second word and onwards, the model uses the whole input sentence and all previously generated target words to infer the result.