TNNs vs RNNs | Adrian Flashcards

1
Q

What is the primary difference in how RNNs and TNNs process sequential data?

A

RNNs process sequential data step-by-step, maintaining a memory of previous inputs through hidden states, while TNNs use a self-attention mechanism to process the entire sequence at once, capturing relationships between all words in the sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does the memory mechanism differ between RNNs and TNNs?

A

RNNs use hidden states to maintain memory of previous inputs, while TNNs rely on self-attention to weigh the importance of each word in the sequence relative to others, without needing hidden states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the vanishing gradient problem, and which architecture is more prone to it?

A

The vanishing gradient problem occurs when gradients become too small during training, making it difficult to learn long-term dependencies. RNNs are more prone to this issue, especially when trained using backpropagation through time (BPTT).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do TNNs overcome the vanishing gradient problem?

A

TNNs overcome the vanishing gradient problem by using self-attention mechanisms, which allow them to directly capture relationships between distant words in a sequence without relying on long chains of hidden states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the self-attention mechanism in TNNs, and how does it work?

A

The self-attention mechanism in TNNs computes attention weights for each word in a sequence based on its similarity to other words, allowing the model to focus on the most relevant parts of the input when making predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do RNNs handle long-term dependencies in sequential data?

A

RNNs handle long-term dependencies using hidden states, but they struggle with long sequences due to the vanishing gradient problem. Variants like LSTMs and GRUs were developed to mitigate this issue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is LSTM, and how does it improve upon traditional RNNs?

A

LSTM (Long Short-Term Memory) is a type of RNN that uses gates (input, forget, and output gates) to control the flow of information, allowing it to better retain or forget information over long sequences and mitigate the vanishing gradient problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do TNNs compare to RNNs in terms of parallelization?

A

TNNs are highly parallelizable because they process the entire sequence at once using self-attention, while RNNs process sequences step-by-step, making them less efficient for parallel computation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the critical path in RNNs, and how does it affect latency?

A

The critical path in RNNs refers to the sequence of linked machine learning models required to process input and generate output. This sequential nature increases latency, especially for long sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do TNNs reduce latency compared to RNNs?

A

TNNs reduce latency by processing the entire sequence in parallel using self-attention, eliminating the need for sequential processing and reducing the time required to generate responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the training complexity of RNNs compared to TNNs?

A

RNNs are computationally intensive to train due to their sequential nature and the need for backpropagation through time (BPTT). TNNs, while also complex, are more efficient to train because of their parallelizable architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an example of a large language model that uses TNN architecture?

A

GPT-3 (Generative Pre-trained Transformer 3) is a large language model that uses the transformer architecture with self-attention mechanisms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do TNNs handle contextual understanding compared to RNNs?

A

TNNs excel at contextual understanding because the self-attention mechanism allows them to consider the entire context of a sequence at once, while RNNs rely on hidden states, which can lose context over long sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the main advantage of RNNs over TNNs?

A

RNNs are simpler and more efficient for short sequences or tasks where sequential processing is sufficient, while TNNs are better suited for long sequences and tasks requiring deep contextual understanding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do TNNs and RNNs differ in terms of hardware requirements?

A

TNNs require more powerful hardware (e.g., GPUs or TPUs) due to their parallel processing and large-scale computations, while RNNs can run on less powerful hardware but may struggle with long sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Wesley Hardy Ballards name?

A

Wesley Hardy Ballard