Recurrent Neural Networks (RNNs) & NLP (Natural Language Processing) Flashcards

1
Q

NLP

A

The process in which AI is taught to understand the rules and syntax of language and made to use algorithms related to language to carry out certain tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

5 types of tasks in NLP

A

Language generation
Answering Questions
Text Classification
Sentiment Analysis
Machine Translation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Difference between RNN and a normal NN (feed-forward nerual networks)?

A

RNNs can remember information that it has previously processed and learnt which allows them to make future predictions based on input that it has already processed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is the fact that RNNs can remember previous inputs so crucial in text generation?

A

Essentially, it can generate each word based on the words before it (which provide context)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 use cases of RNNs

A

Autocomplete - Can predict what the next words in a sentence are based on the context of the previous words

Machine Translation - Used in neural machine translations systems which allow for translation from one language to another by understanding the meaning and sequences of words in both languages

Chatbots - Used in conversational agents to facilitate a human-like conversation based on the context of the conversation history.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Tokenization

A

Turning a sentence in an array of words (with each word in the array representing a token)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

So let’s say after tokenization, we have 12 unique tokens. What happens next?

A

Those 12 tokens suggest 12 unique words, there are therefore then 12 output neurons in the RNN. When there will be an output, it will print out the whole set of unique words, with a probability attached to each word that predicts the likelihood of it being the next word.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Encode

A

This is turning each into unique word into an array of values that represent the word, based on the list of the possible words.

e.g.

“The” - x(1) = [1, 0, 0, 0, 0]
1 depends on the position of the word in the array of unique words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Both feed-forward NNs and RNNs have _____ & ______

A

weights and biases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

RNNs have _____ which feed-forward NNs do not have

(one sentence explanation after)

A

vectors

(these represent a hidden state basically a place where data is stored from previous iterations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Time step

A

Unique to RNNs. This is when the RNN reads one element (one word) from a sequence (sentence), updates its hidden state, and produces an output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hidden State

A

It is a vector that is updated after each time step using the input and the previous hidden state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the new hidden state created in a RNN?

A

It combines the following:

  • Current input
  • Previous hidden state
  • A bias

There is some mathematical process behind this but it does not matter too much for the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

RNN

A

Recurrent Neural Networks are designed to process sequences of data by maintaining a hidden state that captures information from previous time steps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Embedding why?

A

So predicting the next word is not as easy as just generating probability of the next word since there are many linguistic nuances with words and sentences. Embedding is what we use to handle this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Embedding

A

Embedding and embedding layers give words meaning. Just like with the probability from before, they are assigned a value based on factors like similarity to other words in the training set, syntax, contextual meaning, sentiment, prefixes/suffixes.

16
Q

Process from tokenization to vectorization

A
  1. Tokenize the string by creating an array of words based off of it
  2. Encode the word into an array of values that represent the word, based on the list of possible words (e.g. [1, 0, 0, 0, 0])
  3. Embed the words to give them more meaning by assigning this word a value based on factors like similarity to other words in the training set, syntax, contextual meaning etc.
    (it will produce something like this once done for all words: [.345, .912, .665])
  4. Vectorization - This above set of vales represent a vector, meaning that each word can be graphed relative to other words in the training set.
17
Q

Embedding layers are often _______

A

pre-trained

18
Q

The embedding layer comes after the ______ layer but before ________ layer

A

input, hidden

19
Q

Backpropagation through time (BPTT)

A
  1. Forward pass: process the sequence and store hidden states and outputs (data gets processed through the RNN basically)
  2. Unroll the RNN’s layers across time steps and essentially, turn all time steps kind of like into one network
  3. Calculate the loss at each time step and sum them up
  4. Calculate gradients of the loss based on outputs, hidden states, weights, and biases across time steps. Sum each gradient up.
  5. Adjust the weights and biases by inputting the summed up gradient into a gradient descent function
20
Q

Computing Requirements Backpropagation vs. BPTT

A

Backpropagation:
Less memory and processing power required

BPTT:
More memory and processing power required

20
Q

There is ____________ dependency in BPTT

explain briefly as well

A

temporal

Essentially, there is a rule for the order in time we backpropagate

21
Q

Vanishing Gradient Problem in RNN

A

BPTT suffers from this problem more severely than normal backpropagation since there is repeated multiplication of smaller and smaller values over many time steps. So we’re not going through just layers, we are going through layers multiplied by the number of time steps.

22
Q

As the sigmoid function ___________, the derivative _______

A

increases, decreases

23
Q

3 pros of an RNN

A
  1. Can handle sequential data (e.g. sentences)
  2. Has capability to remember information from previous inputs due to its internal state
  3. Flexibility to handle different types of sequential data like text, audio, video and time series data
24
Q

3 cons of an RNN

A
  1. More prone to vanishing gradient problem since it has to multiply gradients over all time steps.
  2. Can be memory intensive for a computer to train
  3. Time-consuming to train, particularly for long sequences or large datasets. This is due to the sequential nature of data processing in RNNs.