NLP 2 Flashcards

Question 1

Q

What is RNN?

Answer

A

neural network with a loop is called a Recurrent Neural Network (RNN)

Question 2

Q

3 layers of RNN

Answer

A

Input (embedding for vocab size to hidden layer), hidden (fully connected), output (to predict the target token)

Question 3

Q

Why BPTT is needed?

Answer

A

RNNs process sequential data by maintaining a hidden state that is updated at each time step. However, training an RNN is challenging because the network’s loss depends not just on the current input but also on all the previous inputs due to the recurrence relationship. This sequential dependency means that the network’s weights must be updated based on errors accumulated over multiple time steps, not just a single layer.

Question 4

Q

What is BPTT?

Answer

A

Backpropagation Through Time

Question 5

Q

Concept of multilayer RNN and why it’s used

Answer

A

outputs of a first RNN are used as the input for a second RNN; While the model is very deep in principle, each predicted token only depends on one linear layer

Question 6

Q

Why Long Short-Term Memory is needed?

Answer

A

Helps to separately learn (1) information required to predict the next token and (2) contextual information learned throughout the already seen token. (saves gender for the next words)

Question 7

Q

How LSTM works?

Answer

A

Second hidden state: “short term memory” / “cell state”

Question 8

Q

Four main networks in LSTM

Answer

A

Forget gate, input gate (gender), cell gate (female), output gate

Question 9

Q

What is dropout?

Answer

A

During each training iteration,
randomly deactivate neurons with a probability p
– To compensate, all activations are multiplied
by 1/(1 – p) during training

Question 10

Q

Drawbacks of dropout?

Answer

A

Less input used to calculate the output

Question 11

Q

Other regularisation techniques

Answer

A

Weight decay, Activation regularisation (AR), Temporal activation regularisation (TAR)

Question 12

Q

Weight tying?

Answer

A

Mapping from input to hidden and hidden to output have the same weights. Used in AWD-LSTM

Question 13

Q

In an RNN, the embedding of a token does not depend on its position in the sequence.

Answer

A

True. The token embedding is determined by a lookup table or an embedding matrix, which assigns a fixed vector representation to each token based on the vocabulary.

Question 14

Q

Which factors influence the number of parameters in RNN for next-token-prediction?

Answer

A

Size of the embedding, Number of tokens in the vocab

Question 15

Q

Techniques used in AWD-LSTM?

Answer

A

Activation regularization (AR), dropout

Question 16

Q

What is AWD-LSTM?

Answer

Study These Flashcards

A

Average Weight-Dropped Long Short-Term Memory. It introduces specific modifications and techniques to standard LSTMs to make them more effective for training on sequential data.

Question 17

Q

Explain on an intuitive level how an LSTM works.

Answer

Study These Flashcards

A

The LSTM uses a long-short term memory to store meta or contextual information in its cell state that is used together with the short term or hidden state to make predictions.

Question 18

Q

How many times does the cell state flow trough a neural network?

Answer

Study These Flashcards

A

The cell state never flows trough any neural network.

Question 19

Q

Is it certain, that both the hidden and cell state contribute to the prediction?

Answer

Study These Flashcards

A

Technically it is not certain, as the hidden state flows trough a sigmoid function before being merged for the prediction which has the ability to make it a tensor of all 0. However it is super unlikely for this to happen.