Recurrent Neural Networks Flashcards

Question 1

Q

Why do we need RNNs that accept variable-length input?

Answer

A

Many real-world sequences (like sentences) have variable lengths, and RNNs can process them while maintaining temporal information.

Question 2

Q

What is the Seq2Seq architecture used for?

Answer

A

It is used for transforming one sequence into another, such as translating a sentence from one language to another.

Question 3

Q

What are the components of a Seq2Seq model?

Answer

A

An encoder RNN and a decoder RNN.

Question 4

Q

What does the encoder in a Seq2Seq model do?

Answer

A

It processes the input sequence and produces a context vector summarizing the sequence.

Question 5

Q

What does the decoder in a Seq2Seq model do?

Answer

A

It generates the output sequence using the context vector.

Question 6

Q

What problem arises from using a fixed-size context vector in Seq2Seq?

Answer

A

It may not be sufficient to capture all the information for long input sequences.

Question 7

Q

Q: What problem arises from using a fixed-size context vector in Seq2Seq?
A: It may not be sufficient to capture all the information for long input sequences.

How is this problem addressed?

Answer

A

With attention mechanisms that allow the decoder to access all encoder hidden states, not just the final one.

Question 8

Q

What does the attention mechanism do?

Answer

A

It computes a context vector dynamically for each decoder step by focusing on different parts of the input sequence.

Question 9

Q

What are the three components involved in attention computation?

Answer

A

Query (decoder hidden state), Keys (encoder hidden states), and Values (encoder hidden states).

Question 10

Q

How is the attention weight computed?

Answer

A

By taking a similarity score between the query and each key, usually followed by a softmax.

Question 11

Q

What are common types of attention score functions?

Answer

A

Dot-product, multiplicative (general), and additive (Bahdanau) attention.

Question 12

Q

How is dot-product attention computed?

Answer

A

It’s the dot product of the query and key vectors.

Question 13

Q

What is additive attention (Bahdanau attention)?

Answer

A

It uses a feedforward network with a single hidden layer to combine the query and key.

Question 14

Q

How does the decoder use attention in each time step?

Answer

A

It calculates attention weights, computes a context vector, and uses it along with the decoder hidden state to generate output.

Question 15

Q

What is concatenated in the decoder before generating the output token?

Answer

A

The context vector and the current decoder hidden state.

Question 16

Q

What is teacher forcing?

Answer

A

A training technique where the decoder receives the ground truth token from the previous time step as input rather than its own previous prediction.

Question 17

Q

What is the advantage of teacher forcing?

Answer

A

It speeds up convergence and reduces error propagation during training.

Question 18

Q

Name some applications of Seq2Seq with attention.

Answer

A

Machine translation, summarization, speech recognition, question answering.

Question 19

Q

What is a Bidirectional RNN?

Answer

A

An RNN that processes the sequence in both forward and backward directions.

Question 20

Q

What is the benefit of Bidirectional RNNs?

Answer

A

They capture both past and future context for each time step.

Question 21

Q

What is the vanishing gradient problem in RNNs?

Answer

A

Gradients become too small to update weights effectively, especially over long sequences.

Question 22

Q

Gradients become too small to update weights effectively, especially over long sequences.

How is this issue addressed?

Answer

A

Using LSTM or GRU architectures that include gating mechanisms.