Lecture 3 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are the building blocks of AI?

A

addition and multiplication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

There are two sides to the “is AI intelligent” debate, what are these two sides?

A

It’s just math (so no) vs. look at the performance (so yes)

aka how ai works vs what it can do

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Xor problem?

A

Very simple models cannot learn the logical function of either this or that

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the solution to the Xor problem?

A

more complicated neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a neural network?

A

A neural network is really just stacked logistic regression (and intermediate steps)

Look at figure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why do neural networks work better than logistic regression?

A

because they can do multiple things at once, intermediate steps

(i.e. use genre as a determinant as the example in the lecture aka consider other factors that have an effect)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the trade-off of using neural networks vs. logistic regression?

A

Neural networks need large amounts of data to work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The intermediate steps that are a part of neural networks, do they have to be put in?

A

No the model predicts these intermediate predictors themself, based on prior data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the universal approximation theorem?

A

Any function that looks like predictor <> outcome can be captured with neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Does an AI like chatGPT work on this neural network model?

A

No, but it is one of the building blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do language models work (simply)?

A

They predict the next word (dependent on the data they trained with)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why do neural networks not work for language models?

A

They disregard the order of the words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the solution for neural networks not working for language models?

A

Transformer models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are transformer models used for?

A

Language based models, but nowadays most AI have this as their basis (non-language included)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

there are two sides to the transformer models, why?

A

The left side is to transform the input words into numbers and the right side is to put the numbers back into words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are skip-connections?

A

Basically the number strings (words) are copied and pasted throughout the model, as to not forget good info that was early in the model

Basically you combine the early simple thing with the later complicated thing to get the best result

17
Q

The skip-connections are the add-norm part of the model, what do the add and norm individually do tho?

A

The add part is really the skip-connections, so the simple + complicated

The norm part is to make sure it is not blown out of proportion aka you only take a small part of the early and not the whole

18
Q

What is the positionwise FFN?

A

The “basic” neural networks introduced in the lecture (stacked logistic regression)

19
Q

What are embeddings?

A

Embeddings are numerical representations of word meanings based on it’s associated w/ other words

20
Q

How do embeddings work?

A

Basically basic NN with singular word input with multiple outputs of the words’ associaton with other words in the language (based on previous learning)

21
Q

[0.4, 0.4, 0] is a simplistic version of an embedding string, what do these numbers mean?

A

Probabilities of the words’ association with other words

22
Q

What is positional encoding?

A

Words get new set of strings that = position in the sentence

23
Q

How does positional encoding work?

A

Wavelines on graph, look at which position the word should be and look at that same word on the graph, then assign associated string w/ it

24
Q

Between which numbers is positional encoding always?

A

-1 and 1

25
Q

How do positional encoding and embedding combine?

A

adding together - [1, 0.3] and [-0.2, 0.8] become [0.8, 1.1]

26
Q

What is multi head attention?

A

Enriching the numbers with the context of other words in the sentence (i.e. prize or price depend on context)

27
Q

How does multi head attention work? short answer

A

Check the word for it’s association with others in this specific sentence and put these into an equation

28
Q

How does multi head attention work? long answer

A

Previous string is taken and 3 new strings are made from it (query, key and value). You take the query from a word and the key from the word you want to check it’s association with (this goes into equation as). The value string you have left is what you put into the equation as the word meaning

Example: wins = correlation * “wins value” + cor * “prize v” + cor * “home v”

29
Q

The transformer model is usually called a base model, when in reality a chat model is usually used, what is the difference?

A

The chat model has an “end token” that makes sure the input is seen as a whole statement so the chat generates new text

30
Q

What is an evaluation type that models often depend on?

A

Human evaluation