Lecture 3 Flashcards

Question 1

Q

What are the building blocks of AI?

Answer

A

addition and multiplication

Question 2

Q

There are two sides to the “is AI intelligent” debate, what are these two sides?

Answer

A

It’s just math (so no) vs. look at the performance (so yes)

aka how ai works vs what it can do

Question 3

Q

What is the Xor problem?

Answer

A

Very simple models cannot learn the logical function of either this or that

Question 4

Q

What is the solution to the Xor problem?

Answer

A

more complicated neural networks

Question 5

Q

What is a neural network?

Answer

A

A neural network is really just stacked logistic regression (and intermediate steps)

Look at figure

Question 6

Q

Why do neural networks work better than logistic regression?

Answer

A

because they can do multiple things at once, intermediate steps

(i.e. use genre as a determinant as the example in the lecture aka consider other factors that have an effect)

Question 7

Q

What is the trade-off of using neural networks vs. logistic regression?

Answer

A

Neural networks need large amounts of data to work

Question 8

Q

The intermediate steps that are a part of neural networks, do they have to be put in?

Answer

A

No the model predicts these intermediate predictors themself, based on prior data

Question 9

Q

What is the universal approximation theorem?

Answer

A

Any function that looks like predictor <> outcome can be captured with neural networks

Question 10

Q

Does an AI like chatGPT work on this neural network model?

Answer

A

No, but it is one of the building blocks

Question 11

Q

How do language models work (simply)?

Answer

A

They predict the next word (dependent on the data they trained with)

Question 12

Q

Why do neural networks not work for language models?

Answer

A

They disregard the order of the words

Question 13

Q

What is the solution for neural networks not working for language models?

Answer

A

Transformer models

Question 14

Q

What are transformer models used for?

Answer

A

Language based models, but nowadays most AI have this as their basis (non-language included)

Question 15

Q

there are two sides to the transformer models, why?

Answer

A

The left side is to transform the input words into numbers and the right side is to put the numbers back into words

Question 16

Q

What are skip-connections?

Answer

A

Basically the number strings (words) are copied and pasted throughout the model, as to not forget good info that was early in the model

Basically you combine the early simple thing with the later complicated thing to get the best result

Question 17

Q

The skip-connections are the add-norm part of the model, what do the add and norm individually do tho?

Answer

A

The add part is really the skip-connections, so the simple + complicated

The norm part is to make sure it is not blown out of proportion aka you only take a small part of the early and not the whole

Question 18

Q

What is the positionwise FFN?

Answer

A

The “basic” neural networks introduced in the lecture (stacked logistic regression)

Question 19

Q

What are embeddings?

Answer

A

Embeddings are numerical representations of word meanings based on it’s associated w/ other words

Question 20

Q

How do embeddings work?

Answer

A

Basically basic NN with singular word input with multiple outputs of the words’ associaton with other words in the language (based on previous learning)

Question 21

Q

[0.4, 0.4, 0] is a simplistic version of an embedding string, what do these numbers mean?

Answer

A

Probabilities of the words’ association with other words

Question 22

Q

What is positional encoding?

Answer

A

Words get new set of strings that = position in the sentence

Question 23

Q

How does positional encoding work?

Answer

A

Wavelines on graph, look at which position the word should be and look at that same word on the graph, then assign associated string w/ it

Question 24

Q

Between which numbers is positional encoding always?

Question 25

Q

How do positional encoding and embedding combine?

Answer

A

adding together - [1, 0.3] and [-0.2, 0.8] become [0.8, 1.1]

Question 26

Q

What is multi head attention?

Answer

A

Enriching the numbers with the context of other words in the sentence (i.e. prize or price depend on context)

Question 27

Q

How does multi head attention work? short answer

Answer

A

Check the word for it’s association with others in this specific sentence and put these into an equation

Question 28

Q

How does multi head attention work? long answer

Answer

A

Previous string is taken and 3 new strings are made from it (query, key and value). You take the query from a word and the key from the word you want to check it’s association with (this goes into equation as). The value string you have left is what you put into the equation as the word meaning

Example: wins = correlation * “wins value” + cor * “prize v” + cor * “home v”

Question 29

Q

The transformer model is usually called a base model, when in reality a chat model is usually used, what is the difference?

Answer

A

The chat model has an “end token” that makes sure the input is seen as a whole statement so the chat generates new text

Question 30

Q

What is an evaluation type that models often depend on?

Answer

A

Human evaluation