lecture 10 Flashcards

1
Q

What major AI breakthrough occurred in November 2022?

A

The introduction of ChatGPT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the fundamental technology behind ChatGPT?

A

The Transformer model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When was the Transformer model introduced?

A

In 2016.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What makes Transformer models powerful?

A

They use self-attention and scale effectively with large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are sequence models used for?

A

Processing sequential data like language, time series, and speech.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are two common sequence models before Transformers?

A

Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a limitation of RNNs?

A

They require sequential processing, making them slow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a limitation of CNNs for sequences?

A

They have a limited memory and cannot capture long-range dependencies effectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What advantage does self-attention provide?

A

It allows parallel processing and captures long-range dependencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the basic function of self-attention?

A

Each output is computed as a weighted sum of all input values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are self-attention weights determined?

A

They are computed dynamically from the input itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the primary benefit of self-attention?

A

It captures dependencies between all elements in a sequence efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the term ‘transformer’ refer to in deep learning?

A

A model architecture that relies on self-attention and feedforward layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a key benefit of Transformers over RNNs?

A

Transformers allow parallel computation, reducing training time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What operation is central to self-attention?

A

Computing similarity between input elements to determine their importance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What mathematical operation is used in self-attention?

A

Dot-product attention.

17
Q

What mechanism normalizes attention weights?

A

The softmax function.

18
Q

What does the softmax function do in self-attention?

A

It converts raw scores into probabilities that sum to 1.

19
Q

What are the three main components of self-attention?

A

Query, Key, and Value matrices.

20
Q

What does the Query (Q) matrix represent?

A

It captures how much attention an input should give to others.

21
Q

What does the Key (K) matrix represent?

A

It determines how much an input should be attended to by others.

22
Q

What does the Value (V) matrix represent?

A

It holds the actual information that will be aggregated.

23
Q

What is scaled dot-product attention?

A

A modification of dot-product attention that scales down large values for stability.

24
Q

What is the benefit of multi-head attention?

A

It allows the model to focus on different aspects of the sequence simultaneously.

25
Q

What is positional encoding in Transformers?

A

A technique to introduce order information into the input sequence.

26
Q

Why is positional encoding necessary?

A

Because self-attention does not inherently preserve word order.

27
Q

What type of functions are used for positional encoding?

A

Sine and cosine functions with different frequencies.

28
Q

What is the role of feedforward layers in Transformers?

A

They apply transformations to each position independently after self-attention.

29
Q

What is the key takeaway from Transformers?

A

They revolutionized sequence processing by enabling efficient parallelization and long-range dependencies.