GRU MLM Flashcards

Question 1

Q

Gated Recurrent Unit (GRU)

Answer

A

Gated Recurrent Unit (GRU) is a type of recurrent neural network architecture that is used extensively in the field of deep learning, especially in tasks that involve sequential data like natural language processing, speech recognition, and time series prediction.

Question 2

Q

Introduction

Answer

A

The Gated Recurrent Unit (GRU) is a recurrent neural network (RNN) architecture, introduced by Cho et al. in 2014. The GRU is like a long short-term memory (LSTM) with a forget gate, but it has fewer parameters than LSTM, as it lacks an output gate.

Question 3

Q

Structure

Answer

A

GRU has two types of gates: a reset gate and an update gate. The reset gate determines how to combine the new input with the previous memory, and the update gate determines how much of the previous memory to keep around. If we set the reset to all ones, and the update gate to all zeros, we arrive back at the vanilla RNN model.

Question 4

Q

Advantages over Vanilla RNNs

Answer

A

GRUs solve the vanishing gradient problem that can be found in traditional RNNs. This is achieved through the gating units, which essentially adaptively capture dependencies of various time scales. This helps GRUs to remember long-term dependencies in the sequence of data, an area where simple RNNs can struggle.

Question 5

Q

Comparison with LSTM

Answer

A

Both GRU and LSTM have gating units that modulate the flow of information inside the unit, however, unlike LSTM, GRU has two gates (reset and update gates). This makes the GRU lighter and faster to train than the LSTM. On the other hand, LSTM units have proven their effectiveness on a wider range of tasks because of their higher complexity and expressive power.

Question 6

Q

Use Cases

Answer

A

GRUs are widely used in tasks like language modeling (e.g., Google Translate uses GRU for translation tasks), speech recognition, and time series prediction.

Question 7

Q

Training

Answer

A

Like most neural networks, GRUs are trained using gradient-based optimization methods such as stochastic gradient descent (SGD), Adam, or RMSprop. The backpropagation through time (BPTT) algorithm is used to compute the gradients for sequence models like GRU.

Question 8

Q

Variations

Answer

A

Variants of GRU include the minimal gated unit (MGU) and the recurrent additive networks (RAN). The MGU has only one gate, further reducing the complexity of the model, while the RAN removes the gates but uses a weighted sum of the current and previous states, giving it the power to capture long-range dependencies while being simpler to analyze and implement.

Question 9

Q

Strengths and Limitations

Answer

A

GRUs are a powerful tool for modeling sequences due to their ability to capture long-range dependencies and their more efficient training process compared to LSTMs. However, they can still suffer from issues such as sensitivity to the initialization of parameters, difficulty in parallelizing the computation, and may struggle with very long sequences, something LSTMs handle a bit better due to their architecture.