Recurrent Neural Networks (RNNs) & NLP (Natural Language Processing) Flashcards
NLP
The process in which AI is taught to understand the rules and syntax of language and made to use algorithms related to language to carry out certain tasks.
5 types of tasks in NLP
Language generation
Answering Questions
Text Classification
Sentiment Analysis
Machine Translation
Difference between RNN and a normal NN (feed-forward nerual networks)?
RNNs can remember information that it has previously processed and learnt which allows them to make future predictions based on input that it has already processed.
Why is the fact that RNNs can remember previous inputs so crucial in text generation?
Essentially, it can generate each word based on the words before it (which provide context)
3 use cases of RNNs
Autocomplete - Can predict what the next words in a sentence are based on the context of the previous words
Machine Translation - Used in neural machine translations systems which allow for translation from one language to another by understanding the meaning and sequences of words in both languages
Chatbots - Used in conversational agents to facilitate a human-like conversation based on the context of the conversation history.
Tokenization
Turning a sentence in an array of words (with each word in the array representing a token)
So let’s say after tokenization, we have 12 unique tokens. What happens next?
Those 12 tokens suggest 12 unique words, there are therefore then 12 output neurons in the RNN. When there will be an output, it will print out the whole set of unique words, with a probability attached to each word that predicts the likelihood of it being the next word.
Encode
This is turning each into unique word into an array of values that represent the word, based on the list of the possible words.
e.g.
“The” - x(1) = [1, 0, 0, 0, 0]
1 depends on the position of the word in the array of unique words.
Both feed-forward NNs and RNNs have _____ & ______
weights and biases
RNNs have _____ which feed-forward NNs do not have
(one sentence explanation after)
vectors
(these represent a hidden state basically a place where data is stored from previous iterations)
Time step
Unique to RNNs. This is when the RNN reads one element (one word) from a sequence (sentence), updates its hidden state, and produces an output
Hidden State
It is a vector that is updated after each time step using the input and the previous hidden state
How is the new hidden state created in a RNN?
It combines the following:
- Current input
- Previous hidden state
- A bias
There is some mathematical process behind this but it does not matter too much for the test.
RNN
Recurrent Neural Networks are designed to process sequences of data by maintaining a hidden state that captures information from previous time steps
Embedding why?
So predicting the next word is not as easy as just generating probability of the next word since there are many linguistic nuances with words and sentences. Embedding is what we use to handle this.
Embedding
Embedding and embedding layers give words meaning. Just like with the probability from before, they are assigned a value based on factors like similarity to other words in the training set, syntax, contextual meaning, sentiment, prefixes/suffixes.
Process from tokenization to vectorization
- Tokenize the string by creating an array of words based off of it
- Encode the word into an array of values that represent the word, based on the list of possible words (e.g. [1, 0, 0, 0, 0])
- Embed the words to give them more meaning by assigning this word a value based on factors like similarity to other words in the training set, syntax, contextual meaning etc.
(it will produce something like this once done for all words: [.345, .912, .665]) - Vectorization - This above set of vales represent a vector, meaning that each word can be graphed relative to other words in the training set.
Embedding layers are often _______
pre-trained
The embedding layer comes after the ______ layer but before ________ layer
input, hidden
Backpropagation through time (BPTT)
- Forward pass: process the sequence and store hidden states and outputs (data gets processed through the RNN basically)
- Unroll the RNN’s layers across time steps and essentially, turn all time steps kind of like into one network
- Calculate the loss at each time step and sum them up
- Calculate gradients of the loss based on outputs, hidden states, weights, and biases across time steps. Sum each gradient up.
- Adjust the weights and biases by inputting the summed up gradient into a gradient descent function
Computing Requirements Backpropagation vs. BPTT
Backpropagation:
Less memory and processing power required
BPTT:
More memory and processing power required
There is ____________ dependency in BPTT
explain briefly as well
temporal
Essentially, there is a rule for the order in time we backpropagate
Vanishing Gradient Problem in RNN
BPTT suffers from this problem more severely than normal backpropagation since there is repeated multiplication of smaller and smaller values over many time steps. So we’re not going through just layers, we are going through layers multiplied by the number of time steps.
As the sigmoid function ___________, the derivative _______
increases, decreases
3 pros of an RNN
- Can handle sequential data (e.g. sentences)
- Has capability to remember information from previous inputs due to its internal state
- Flexibility to handle different types of sequential data like text, audio, video and time series data
3 cons of an RNN
- More prone to vanishing gradient problem since it has to multiply gradients over all time steps.
- Can be memory intensive for a computer to train
- Time-consuming to train, particularly for long sequences or large datasets. This is due to the sequential nature of data processing in RNNs.