Recurrent Neural Networks (RNNs) & NLP (Natural Language Processing) Flashcards by Unknown Unknown

NLP

The process in which AI is taught to understand the rules and syntax of language and made to use algorithms related to language to carry out certain tasks.

How well did you know this?

Not at all

Perfectly

5 types of tasks in NLP

Language generation
Answering Questions
Text Classification
Sentiment Analysis
Machine Translation

How well did you know this?

Not at all

Perfectly

Difference between RNN and a normal NN (feed-forward nerual networks)?

RNNs can remember information that it has previously processed and learnt which allows them to make future predictions based on input that it has already processed.

How well did you know this?

Not at all

Perfectly

Why is the fact that RNNs can remember previous inputs so crucial in text generation?

Essentially, it can generate each word based on the words before it (which provide context)

How well did you know this?

Not at all

Perfectly

3 use cases of RNNs

Autocomplete - Can predict what the next words in a sentence are based on the context of the previous words

Machine Translation - Used in neural machine translations systems which allow for translation from one language to another by understanding the meaning and sequences of words in both languages

Chatbots - Used in conversational agents to facilitate a human-like conversation based on the context of the conversation history.

How well did you know this?

Not at all

Perfectly

Tokenization

Turning a sentence in an array of words (with each word in the array representing a token)

How well did you know this?

Not at all

Perfectly

So let’s say after tokenization, we have 12 unique tokens. What happens next?

Those 12 tokens suggest 12 unique words, there are therefore then 12 output neurons in the RNN. When there will be an output, it will print out the whole set of unique words, with a probability attached to each word that predicts the likelihood of it being the next word.

How well did you know this?

Not at all

Perfectly

Encode

This is turning each into unique word into an array of values that represent the word, based on the list of the possible words.

e.g.

“The” - x(1) = [1, 0, 0, 0, 0]
1 depends on the position of the word in the array of unique words.

How well did you know this?

Not at all

Perfectly

Both feed-forward NNs and RNNs have _____ & ______

weights and biases

How well did you know this?

Not at all

Perfectly

RNNs have _____ which feed-forward NNs do not have

(one sentence explanation after)

vectors

(these represent a hidden state basically a place where data is stored from previous iterations)

How well did you know this?

Not at all

Perfectly

Time step

Unique to RNNs. This is when the RNN reads one element (one word) from a sequence (sentence), updates its hidden state, and produces an output

How well did you know this?

Not at all

Perfectly

Hidden State

It is a vector that is updated after each time step using the input and the previous hidden state

How well did you know this?

Not at all

Perfectly

How is the new hidden state created in a RNN?

It combines the following:

Current input
Previous hidden state
A bias

There is some mathematical process behind this but it does not matter too much for the test.

How well did you know this?

Not at all

Perfectly

RNN

Recurrent Neural Networks are designed to process sequences of data by maintaining a hidden state that captures information from previous time steps

How well did you know this?

Not at all

Perfectly

Embedding why?

So predicting the next word is not as easy as just generating probability of the next word since there are many linguistic nuances with words and sentences. Embedding is what we use to handle this.

How well did you know this?

Not at all

Perfectly

Embedding

Study These Flashcards

Embedding and embedding layers give words meaning. Just like with the probability from before, they are assigned a value based on factors like similarity to other words in the training set, syntax, contextual meaning, sentiment, prefixes/suffixes.

Process from tokenization to vectorization

Study These Flashcards

Tokenize the string by creating an array of words based off of it
Encode the word into an array of values that represent the word, based on the list of possible words (e.g. [1, 0, 0, 0, 0])
Embed the words to give them more meaning by assigning this word a value based on factors like similarity to other words in the training set, syntax, contextual meaning etc.
(it will produce something like this once done for all words: [.345, .912, .665])
Vectorization - This above set of vales represent a vector, meaning that each word can be graphed relative to other words in the training set.

Embedding layers are often _______

Study These Flashcards

pre-trained

The embedding layer comes after the ______ layer but before ________ layer

Study These Flashcards

input, hidden

Backpropagation through time (BPTT)

Study These Flashcards

Forward pass: process the sequence and store hidden states and outputs (data gets processed through the RNN basically)
Unroll the RNN’s layers across time steps and essentially, turn all time steps kind of like into one network
Calculate the loss at each time step and sum them up
Calculate gradients of the loss based on outputs, hidden states, weights, and biases across time steps. Sum each gradient up.
Adjust the weights and biases by inputting the summed up gradient into a gradient descent function

Computing Requirements Backpropagation vs. BPTT

Study These Flashcards

Backpropagation:
Less memory and processing power required

BPTT:
More memory and processing power required

There is ____________ dependency in BPTT

explain briefly as well

Study These Flashcards

temporal

Essentially, there is a rule for the order in time we backpropagate

Vanishing Gradient Problem in RNN

Study These Flashcards

BPTT suffers from this problem more severely than normal backpropagation since there is repeated multiplication of smaller and smaller values over many time steps. So we’re not going through just layers, we are going through layers multiplied by the number of time steps.

As the sigmoid function ___________, the derivative _______

Study These Flashcards

increases, decreases

3 pros of an RNN

1. Can handle sequential data (e.g. sentences) 2. Has capability to remember information from previous inputs due to its internal state 3. Flexibility to handle different types of sequential data like text, audio, video and time series data

3 cons of an RNN

1. More prone to vanishing gradient problem since it has to multiply gradients over all time steps. 2. Can be memory intensive for a computer to train 3. Time-consuming to train, particularly for long sequences or large datasets. This is due to the sequential nature of data processing in RNNs.

LSTM NN and their purpose

Long-short term memory neural network is a type of a recurrent NN that aims to address the vanishing/exploding gradient problem by having to paths, one for long-term memories and one for short-term memories.

Components of an LSTM NN

They contain the typical input and output layer, but instead of having hidden layers, they have LSTM layers composed of LSTM cells. Each cell contains a series of "gates".

Cell state in an LSTM NN

It's cell state acts as the long-term memory of the LSTM cell. It carries relevant information throughout the sequence of data and is modified by gates to add or remove information.

3 types of gates in an LSTM Cell

Input gate: Decides which values are updated in the cell state Forget Gate: Decides what information is forgotten/deleted from the cell state Output gate: Decides what information in the cell state is used to generate the output.

How LSTM solves the vanishing/exploding gradient problem more specifically?

- The cell state is used to calculate the hidden state, which is sued to generate the final output, which is then used to calculate loss - Loss is used to calculate the gradient for the weights of each layer at each time time step - The gates (input, forget, and output gates) in LSTMs control the information flow by selectively adding relevant new information and by removing outdated, irrelevant information from the cell state - Because the cell state is updated in a controlled manner, it remains more stable and less prone to rapid changes - This helps to preserve the magnitude of the gradient during training

Recurrent Neural Networks (RNNs) & NLP (Natural Language Processing) Flashcards

(31 cards)