DL-07 - Sequence models Flashcards
DL-07 - Sequence models
What is a sequence model?
A model that handles sequential data, where the order of the data matters.
DL-07 - Sequence models
What is another name for sequence models?
seq2seq
DL-07 - Sequence models
What is another name for seq2seq?
Sequence models.
DL-07 - Sequence models
What is the definition of a sequence model?
a ML model where input or output is a sequence of data (e.g., text data, audio data, time series data).
DL-07 - Sequence models
What are the different types of sequence models called (abstract)? (4)
- One to one
- one to many
- many to one
- many to many
DL-07 - Sequence models
Describe what a one to one model looks like.
(See image)
DL-07 - Sequence models
Describe what a one to many model looks like.
(See image)
DL-07 - Sequence models
Describe what a many to one model looks like.
(See image)
DL-07 - Sequence models
Describe what a many to many model looks like.
(See image)
DL-07 - Sequence models
What is a name entity recognition task?
E.g. determine what words in a sentence is an entity. Generally names of things.
(See image)
DL-07 - Sequence models
What task is this an example of? (See image)
Entity recognition
DL-07 - Sequence models
What is sentiment analysis?
Predict the sentiment of some input, e.g. positive or negative. (See image)
DL-07 - Sequence models
What task is this an example of? (See image)
Sentiment analysis.
DL-07 - Sequence models
What is activity recognition?
A task where you label the activity in e.g. an image or a video. (See image)
DL-07 - Sequence models
What task is this? (See image)
Activity recognition.
DL-07 - Sequence models
What are some popular sequence models? (3)
- RNN
- LSTM
- Transformers
DL-07 - Sequence models
What is the main idea behind RNNs?
RNNs process sequential data by maintaining an internal state and iteratively updating it with each input in the sequence.
DL-07 - Sequence models
What model can you think of as a sequence of neural networks that are trained one after another?
RNN (and LSTM)
DL-07 - Sequence models
Describe how we typically draw RNNs.
(See image)
DL-07 - Sequence models
What model is depicted?
RNN
DL-07 - Sequence models
Describe what X, t, h and y are in the image. (See image)
- t is the time step
- x are inputs
- h are hidden states
- y is the predicted outputs
DL-07 - Sequence models
What parameters does an RNN layer have?
- Weights
- Biases
- Hidden state/recurrent weights (output at previous time step)
DL-07 - Sequence models
In an RNN, what is T_x and T_y?
The number of inputs and the number of outputs.
DL-07 - Sequence models
What is BPTT short for?
Backpropagation through time
DL-07 - Sequence models
How does backpropagation through time work?
By unrolling the recurrent neural network through time and applying standard backpropagation to compute gradients for updating weights.
DL-07 - Sequence models
When is loss backpropagated in BPTT?
The loss is backpropagated from the last to the first time step that allows updating the weights.
DL-07 - Sequence models
What is NLP short for?
Natural language processing.
DL-07 - Sequence models
What are the two steps of text sequence representation in Natural Language Processing (NLP)?
The two steps are:
- creation of vocabulary
- numeric representation of text/words.
DL-07 - Sequence models
In NLP, what is a vocabulary?
A dictionary of unique words of interest (e.g. Norwegian or English words).
DL-07 - Sequence models
In NLP, what do we call a dictionary of unique words of interest (e.g. Norwegian or English words)?
A vocabulary.
DL-07 - Sequence models
How are sentences tokenized?
Generally word by word.
(Advanced: See subword tokenization)
DL-07 - Sequence models
How do you create a vocabulary? (5)
Take your input data.
Perform:
- Remove punctuation
- Remove stop words
- Stem words (transporting -> transport)
- Add Start/end of sentence tokens
- Add other identifiers as necessary (<UNKOWN>, <DIGIT>).</DIGIT></UNKOWN>
Select unique words.
DL-07 - Sequence models
What are the commonly used technique for text representation? (3)
- One-hot encoding
- Bag-of-words
- Word embeddings
DL-07 - Sequence models
What are these examples of in NLP?
- One-hot encoding
- Bag-of-words
- Word embeddings
Text representation techniques.
DL-07 - Sequence models
How do you use one-hot-encoding to represent text in NLP?
Convert each unique character or word into a binary vector with a 1 at the position corresponding to that character or word and 0s elsewhere.
DL-07 - Sequence models
What is the issue with one-hot encoding in NLP?
Curse of dimensionality!
DL-07 - Sequence models
How can you solve the problem of curse of dimensionality with one-hot encoded data?
One-hot encoded vectors can be transformed to a lower dimensional space using an embedding technique.
DL-07 - Sequence models
In NLP, what is bag-of-words?
Bag-of-words is a representation technique where a text is described by the frequency of its words, disregarding grammar and word order but maintaining multiplicity.
DL-07 - Sequence models
What is BOW short for?
Bag-of-words representation.
DL-07 - Sequence models
What does the bag-of-words (BOW) representation do with words in a text?
The bag-of-words representation puts words in a “bag” and scores them based on their counts or frequencies in the text.
DL-07 - Sequence models
What could the BOW representation for this sentence look like?
input text: “I love AI. AI is cool”
BoW representation: [2, 1, 1, 1, 1 ] corresponding to the vocabulary: [AI, cool, I, is, love].
The vocabulary is a vector where the index corresponds to a particular word, and the number in that position is the times it occurred in the sentence.
DL-07 - Sequence models
What are some problems with BOW word frequency?
highly frequent words in the document dominate (larger score), even if they do not contain as much “informational content” (E.g. words like I, The, a).
DL-07 - Sequence models
What is TF-IDF short for?
Term Frequency-Inverse Document Frequency
DL-07 - Sequence models
What does TF-IDF do?
It rescales the frequency of words by how often they appear in all documents.
DL-07 - Sequence models
What is the formula for TF-IDF?
(See image)
DL-07 - Sequence models
What are word embeddings in NLP?
Word embedding is a technique to map words or phrases to vectors of numerical values, of given size.
DL-07 - Sequence models
What does the word embedding technique do? (2)
- Maps words or phrases to vectors of numerical values, of given size.
- Dimensionality reduction of word/sentences.
DL-07 - Sequence models
What are some popular word embedding techniques? (3)
- GloVe
- Word2Vec
- NN embedding layer
DL-07 - Sequence models
What are GloVe and Word2Vec examples of?
Word embedding techniques (/models?).
DL-07 - Sequence models
How does an NN embedding layer work?
(See image)
DL-07 - Sequence models
What can happen to gradients in RNNs?
Vanishing or exploding gradients occur, causing the model to stop learning or take too long.
DL-07 - Sequence models
What do traditional sequence models struggle with, in terms of relating to past information?
They cannot relate to the past beyond the immediate previous input.
DL-07 - Sequence models
What is a solution for improving sequence models to better remember distant inputs?
Add memory and make efficient use of it, possibly by forgetting less relevant information.
DL-07 - Sequence models
What are two improved Seq2Seq models that incorporate memory?
- GRU (Gated Recurrent Unit)
- LSTM (Long Short-Term Memory).
DL-07 - Sequence models
What is GRU short for?
Gated Recurrent Unit
DL-07 - Sequence models
What is LSTM short for?
Long Short-Term Memory
DL-07 - Sequence models
Label the parts that are masked out.
- Forget
- Update
- Input
- Output (Result)
DL-07 - Sequence models
What is the main purpose of LSTM networks in deep learning?
LSTM networks extend the memory of RNNs to learn from important experiences with long time steps in between.
DL-07 - Sequence models
What is one advantage of using LSTM networks over traditional RNNs?
LSTM networks enable short-term memory to last for a longer time.
DL-07 - Sequence models
What issue with sequence model training does LSTMS help mitigate?
LSTM networks help mitigate the problematic issue of vanishing gradients.
DL-07 - Sequence models
What are the gates in an LSTM called? (4)
- Input
- Output
- Update
- Forget
DL-07 - Sequence models
What is the purpose of the input gate in an LSTM?
The input gate determines how much of the new input should be added to the cell state.
DL-07 - Sequence models
What is the purpose of the forget gate in an LSTM?
The forget gate decides what information to discard from the cell state.
DL-07 - Sequence models
What is the purpose of the output gate in an LSTM?
The output gate selects which values from the updated cell state will be passed to the next hidden state.
DL-07 - Sequence models
What is the purpose of the update gate in an LSTM?
The update gate computes candidate values to be added to the cell state, based on the current input and previous hidden state.
DL-07 - Sequence models
Describe the LSTM model’s architecture.
(See image)
DL-07 - Sequence models
What are the inputs of the LSTM cell called? (3)
- Input
- Hidden state
- Cell state
DL-07 - Sequence models
What are the outputs of the LSTM cell called? (3)
- Hidden state
- Cell state
- Output
DL-07 - Sequence models
What optimizers have worked well for text data with LSTM for text data? (2)
- Adam
- Adagrad
DL-07 - Sequence models
What activation function and loss should you use for LSTM with text data?
- Softmax (predict prob for word)
- Cross-entropy loss
DL-07 - Sequence models
What metrics would you use for LSTM with text data?
Accuracy, precision, recall. Think of outputs as the probability of outputting the correct word.
DL-07 - Sequence models
What is a bidirectional RNN?
A bidirectional RNN is a type of recurrent neural network that processes input data in both forward and backward directions, capturing information from both past and future contexts.