Pros and Cons of different NNs Flashcards
1
Q
3 cons of an RNN
A
- Prone to vanishing gradient problem since it calculates gradient not only every layer, but over every layer over multiplied by the number of time steps
- Memory intensive to train
- It’s sequential data processing means training can take a long time, especially with larger sequences/amounts of data
2
Q
3 pros of an RNN
A
- Can handle sequential data flexibly
- Simpler architecture (easier to implement)
- Can remember information that it has previously processed, allowing for a better contextual understanding
3
Q
3 pros of an LSTM NN
A
- Better at capturing long-term dependencies compared to RNNs due to their gating mechanisms
- Cell state and controlled information flow means LSTMs maintain stable gradients during BPTT meaning less likelihood of vanishing/exploding gradient problem
- Generally perform better at NLP tasks compared to RNNs because of their controlled information flow (better contextual understanding)
4
Q
3 cons of an LSTM NN
A
- More complex architecture (compared to RNN) making them harder to implement
- More complex so require more time and computational resources to train
- For some tasks like processing very long sequences, newer models like Transformers often outperform LSTMs
5
Q
3 pros of Transformers
A
- Parralel processing of sequences enables faster training and more efficient use of modern GPUs
- Self-attention mechanism allows for understanding relationships between distant elements, improving performance on tasks requiring contextual understanding
- Transformers can be scaled effectively. They are also foundational in powerful pre-trained models like GPT which can be finetuned for other downstream tasks
6
Q
3 cons of Transformers
A
- Resource intensive, especially when making inferences with longer sequences
- Usually require large amounts of data to be trained effectively which can be a barrier for smaller datasets
- Transformers lack an inherent sense of sequence order which can negatively affect performance on certain tasks (this can be avoided with positional encoding)