Pros and Cons of different NNs Flashcards

Question 1

Q

3 cons of an RNN

Answer

A

Prone to vanishing gradient problem since it calculates gradient not only every layer, but over every layer over multiplied by the number of time steps
Memory intensive to train
It’s sequential data processing means training can take a long time, especially with larger sequences/amounts of data

Question 2

Q

3 pros of an RNN

Answer

A

Can handle sequential data flexibly
Simpler architecture (easier to implement)
Can remember information that it has previously processed, allowing for a better contextual understanding

Question 3

Q

3 pros of an LSTM NN

Answer

A

Better at capturing long-term dependencies compared to RNNs due to their gating mechanisms
Cell state and controlled information flow means LSTMs maintain stable gradients during BPTT meaning less likelihood of vanishing/exploding gradient problem
Generally perform better at NLP tasks compared to RNNs because of their controlled information flow (better contextual understanding)

Question 4

Q

3 cons of an LSTM NN

Answer

A

More complex architecture (compared to RNN) making them harder to implement
More complex so require more time and computational resources to train
For some tasks like processing very long sequences, newer models like Transformers often outperform LSTMs

Question 5

Q

3 pros of Transformers

Answer

A

Parralel processing of sequences enables faster training and more efficient use of modern GPUs
Self-attention mechanism allows for understanding relationships between distant elements, improving performance on tasks requiring contextual understanding
Transformers can be scaled effectively. They are also foundational in powerful pre-trained models like GPT which can be finetuned for other downstream tasks

Question 6

Q

3 cons of Transformers

Answer

A

Resource intensive, especially when making inferences with longer sequences
Usually require large amounts of data to be trained effectively which can be a barrier for smaller datasets
Transformers lack an inherent sense of sequence order which can negatively affect performance on certain tasks (this can be avoided with positional encoding)