TNNs vs RNNs | Adrian Flashcards

Question 1

Q

What is the primary difference in how RNNs and TNNs process sequential data?

Answer

A

RNNs process sequential data step-by-step, maintaining a memory of previous inputs through hidden states, while TNNs use a self-attention mechanism to process the entire sequence at once, capturing relationships between all words in the sequence.

Question 2

Q

How does the memory mechanism differ between RNNs and TNNs?

Answer

A

RNNs use hidden states to maintain memory of previous inputs, while TNNs rely on self-attention to weigh the importance of each word in the sequence relative to others, without needing hidden states.

Question 3

Q

What is the vanishing gradient problem, and which architecture is more prone to it?

Answer

A

The vanishing gradient problem occurs when gradients become too small during training, making it difficult to learn long-term dependencies. RNNs are more prone to this issue, especially when trained using backpropagation through time (BPTT).

Question 4

Q

How do TNNs overcome the vanishing gradient problem?

Answer

A

TNNs overcome the vanishing gradient problem by using self-attention mechanisms, which allow them to directly capture relationships between distant words in a sequence without relying on long chains of hidden states.

Question 5

Q

What is the self-attention mechanism in TNNs, and how does it work?

Answer

A

The self-attention mechanism in TNNs computes attention weights for each word in a sequence based on its similarity to other words, allowing the model to focus on the most relevant parts of the input when making predictions.

Question 6

Q

How do RNNs handle long-term dependencies in sequential data?

Answer

A

RNNs handle long-term dependencies using hidden states, but they struggle with long sequences due to the vanishing gradient problem. Variants like LSTMs and GRUs were developed to mitigate this issue.

Question 7

Q

What is LSTM, and how does it improve upon traditional RNNs?

Answer

A

LSTM (Long Short-Term Memory) is a type of RNN that uses gates (input, forget, and output gates) to control the flow of information, allowing it to better retain or forget information over long sequences and mitigate the vanishing gradient problem.

Question 8

Q

How do TNNs compare to RNNs in terms of parallelization?

Answer

A

TNNs are highly parallelizable because they process the entire sequence at once using self-attention, while RNNs process sequences step-by-step, making them less efficient for parallel computation.

Question 9

Q

What is the critical path in RNNs, and how does it affect latency?

Answer

A

The critical path in RNNs refers to the sequence of linked machine learning models required to process input and generate output. This sequential nature increases latency, especially for long sequences.

Question 10

Q

How do TNNs reduce latency compared to RNNs?

Answer

A

TNNs reduce latency by processing the entire sequence in parallel using self-attention, eliminating the need for sequential processing and reducing the time required to generate responses.

Question 11

Q

What is the training complexity of RNNs compared to TNNs?

Answer

A

RNNs are computationally intensive to train due to their sequential nature and the need for backpropagation through time (BPTT). TNNs, while also complex, are more efficient to train because of their parallelizable architecture.

Question 12

Q

What is an example of a large language model that uses TNN architecture?

Answer

A

GPT-3 (Generative Pre-trained Transformer 3) is a large language model that uses the transformer architecture with self-attention mechanisms.

Question 13

Q

How do TNNs handle contextual understanding compared to RNNs?

Answer

A

TNNs excel at contextual understanding because the self-attention mechanism allows them to consider the entire context of a sequence at once, while RNNs rely on hidden states, which can lose context over long sequences.

Question 14

Q

What is the main advantage of RNNs over TNNs?

Answer

A

RNNs are simpler and more efficient for short sequences or tasks where sequential processing is sufficient, while TNNs are better suited for long sequences and tasks requiring deep contextual understanding.

Question 15

Q

How do TNNs and RNNs differ in terms of hardware requirements?

Answer

A

TNNs require more powerful hardware (e.g., GPUs or TPUs) due to their parallel processing and large-scale computations, while RNNs can run on less powerful hardware but may struggle with long sequences.

Question 16

Q

What is Wesley Hardy Ballards name?

Answer

A

Wesley Hardy Ballard