Transformers Flashcards

Question 1

Q

Explain how self-attention helps solve the problem of understanding pronoun references in the example sentence ‘The dog ran so fast that it looked like a brown dot as it ran away.’

Answer

A

Self-attention helps the model focus on ‘the dog’ when processing ‘it,’ ensuring the pronoun is linked correctly to what ‘it’ refers to.

Question 2

Q

What are the main challenges with using RNNs for processing text?

Answer

A

RNNs have trouble with long sentences due to vanishing gradients, process data one step at a time (making them slow), and can’t take full advantage of parallel processing.

Question 3

Q

What are the benefits of using transformers for processing text?

Answer

A

Transformers process long sentences better, train faster with parallel computation, and use self-attention to focus on important words in context.

Question 4

Q

Explain the role of encoders and decoders in the sequence-to-sequence architecture. How do they work together to process information?

Answer

A

The encoder converts input data into a summary (context), and the decoder uses that summary to create output, like translating a sentence.

Question 5

Q

What role does scaling (in terms of data, parameters, and compute) play in transformer performance? Is continuous scaling a sustainable path forward?

Answer

A

Scaling improves transformer results by allowing them to learn more, but it may not be sustainable due to high costs and environmental impact.

Question 6

Q

The slides suggest that being ‘next word prediction machines’ might not be sufficient for human-like intelligence. What are the implications of this observation?

Answer

A

It means transformers may lack true understanding or reasoning, showing the need for models that can think more deeply like humans.

Question 7

Q

Transformers can do ‘in context learning.’ Explain what in context learning means. Provide an example if necessary.

Answer

A

In-context learning is when a transformer model performs a task based on examples or instructions provided in the input text, without the need of retraining it.
Example:
For a sentiment classification task:
Input:
“Classify the sentiment: ‘I love this movie!’ -> Positive”
Output:
Positive