lecture 10 Flashcards

Question 1

Q

What major AI breakthrough occurred in November 2022?

Answer

A

The introduction of ChatGPT.

Question 2

Q

What is the fundamental technology behind ChatGPT?

Answer

A

The Transformer model.

Question 3

Q

When was the Transformer model introduced?

Question 4

Q

What makes Transformer models powerful?

Answer

A

They use self-attention and scale effectively with large datasets.

Question 5

Q

What are sequence models used for?

Answer

A

Processing sequential data like language, time series, and speech.

Question 6

Q

What are two common sequence models before Transformers?

Answer

A

Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs).

Question 7

Q

What is a limitation of RNNs?

Answer

A

They require sequential processing, making them slow.

Question 8

Q

What is a limitation of CNNs for sequences?

Answer

A

They have a limited memory and cannot capture long-range dependencies effectively.

Question 9

Q

What advantage does self-attention provide?

Answer

A

It allows parallel processing and captures long-range dependencies.

Question 10

Q

What is the basic function of self-attention?

Answer

A

Each output is computed as a weighted sum of all input values.

Question 11

Q

How are self-attention weights determined?

Answer

A

They are computed dynamically from the input itself.

Question 12

Q

What is the primary benefit of self-attention?

Answer

A

It captures dependencies between all elements in a sequence efficiently.

Question 13

Q

What does the term ‘transformer’ refer to in deep learning?

Answer

A

A model architecture that relies on self-attention and feedforward layers.

Question 14

Q

What is a key benefit of Transformers over RNNs?

Answer

A

Transformers allow parallel computation, reducing training time.

Question 15

Q

What operation is central to self-attention?

Answer

A

Computing similarity between input elements to determine their importance.

Question 16

Q

What mathematical operation is used in self-attention?

Answer

A

Dot-product attention.

Question 17

Q

What mechanism normalizes attention weights?

Answer

A

The softmax function.

Question 18

Q

What does the softmax function do in self-attention?

Answer

A

It converts raw scores into probabilities that sum to 1.

Question 19

Q

What are the three main components of self-attention?

Answer

A

Query, Key, and Value matrices.

Question 20

Q

What does the Query (Q) matrix represent?

Answer

A

It captures how much attention an input should give to others.

Question 21

Q

What does the Key (K) matrix represent?

Answer

A

It determines how much an input should be attended to by others.

Question 22

Q

What does the Value (V) matrix represent?

Answer

A

It holds the actual information that will be aggregated.

Question 23

Q

What is scaled dot-product attention?

Answer

A

A modification of dot-product attention that scales down large values for stability.

Question 24

Q

What is the benefit of multi-head attention?

Answer

A

It allows the model to focus on different aspects of the sequence simultaneously.

Question 25

Q

What is positional encoding in Transformers?

Answer

A

A technique to introduce order information into the input sequence.

Question 26

Q

Why is positional encoding necessary?

Answer

A

Because self-attention does not inherently preserve word order.

Question 27

Q

What type of functions are used for positional encoding?

Answer

A

Sine and cosine functions with different frequencies.

Question 28

Q

What is the role of feedforward layers in Transformers?

Answer

A

They apply transformations to each position independently after self-attention.

Question 29

Q

What is the key takeaway from Transformers?

Answer

A

They revolutionized sequence processing by enabling efficient parallelization and long-range dependencies.