Class Eleven Flashcards
What is Long Short-Term Memory (LSTM)?
Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) architecture that is specifically designed to address the vanishing gradient problem in traditional RNNs. It introduces memory cells and gating mechanisms to effectively capture long-term dependencies in sequential data.
What are the advantages of using LSTM?
Advantages of using LSTM include its ability to capture long-term dependencies, handle vanishing/exploding gradients, and mitigate the issues of information loss and gradient decay over long sequences.
How does an LSTM cell work?
An LSTM cell consists of a memory cell, an input gate, a forget gate, and an output gate. The input gate controls the flow of information into the memory cell, the forget gate controls the retention or removal of information from the cell, and the output gate controls the output of information from the cell.
What is the purpose of the memory cell in an LSTM?
The memory cell in an LSTM is responsible for storing and updating the information over time. It allows the network to selectively retain or forget information based on the input and gating mechanisms.
What is the role of the input gate in an LSTM?
The input gate in an LSTM determines how much of the new input should be stored in the memory cell. It selectively updates the cell state based on the input and the previous cell state.
What is the purpose of the forget gate in an LSTM?
The forget gate in an LSTM controls the amount of information retained in the memory cell. It decides which information from the previous cell state should be discarded based on the input and the previous hidden state.
What is the function of the output gate in an LSTM?
The output gate in an LSTM determines the amount of information that should be output from the memory cell. It applies a non-linear activation function to the current cell state and controls the flow of information to the next hidden state.
How does Gated Recurrent Unit (GRU) differ from LSTM?
Gated Recurrent Unit (GRU) is a variant of LSTM that also addresses the vanishing gradient problem but has a simpler architecture. GRU combines the input gate and the forget gate into a single update gate, reducing the number of parameters compared to LSTM.
What are the advantages of using GRU?
Advantages of using GRU include its simpler architecture, which results in fewer parameters and faster training compared to LSTM. GRU is particularly useful when dealing with less complex sequential data or when computational resources are limited.
How does the update gate in GRU work?
The update gate in GRU determines the amount of information that should be updated or discarded in the current time step. It combines the roles of the input gate and forget gate in LSTM.
What is the reset gate in GRU?
The reset gate in GRU controls how much of the previous hidden state should be forgotten or retained in the current time step. It allows the model to selectively reset or preserve the memory of past information.
How are LSTMs and GRUs trained?
LSTMs and GRUs are trained using backpropagation through time (BPTT), where the gradients are computed and used to update the model parameters. The training process involves optimizing a loss function by minimizing the prediction error.
In which domains or applications are LSTMs and GRUs commonly used?
LSTMs and GRUs are commonly used in natural language processing (NLP) tasks such as language translation, sentiment analysis, text generation, and speech recognition. They are also utilized in time series analysis, anomaly detection, and other sequence-based applications.
What are some limitations of LSTMs and GRUs?
Limitations of LSTMs and GRUs include the potential for overfitting, the requirement of large amounts of data for effective training, and the computational complexity, which can make them slower to train and deploy compared to simpler models.
How do LSTMs and GRUs help address the vanishing gradient problem?
LSTMs and GRUs address the vanishing gradient problem by using gating mechanisms that selectively retain or discard information over time. This allows them to capture and propagate gradients effectively over long sequences.