10 - The Algorithm that Put Paid to a Persistent Myth Flashcards

Question 1

Q

What did Minsky and Papert prove about single-layer perceptrons?

Answer

A

They proved that single-layer perceptrons could not solve the XOR problem

This proof is often cited as a turning point in neural network research.

Question 2

Q

Who is Geoffrey Hinton?

Answer

A

A key figure behind the modern deep learning revolution

Hinton became interested in neural networks in the mid-1960s.

Question 3

Q

What influenced Hinton’s interest in how brains learn?

Answer

A

A mathematician friend exploring how memories are stored in the brain

This led Hinton to study the mind and neural networks.

Question 4

Q

What did Hinton study at university?

Answer

A

Physics and physiology

However, he found the curriculum insufficient regarding understanding the brain.

Question 5

Q

What book deeply influenced Hinton?

Answer

A

The Organization of Behavior by Donald Hebb

This book impacted Hinton’s thinking on neural networks and learning.

Question 6

Q

What was Hinton’s doctoral focus?

Answer

A

Solving constrained optimization problems using neural networks

Hinton believed multi-layer networks could eventually learn.

Question 7

Q

What was the key limitation of single-layer perceptrons according to Minsky and Papert?

Answer

A

They could not solve the XOR problem, which is a specific instance of a broader class of problems

This limitation led to skepticism about neural networks for some time.

Question 8

Q

What is back-propagation?

Answer

A

A method for training multi-layer neural networks by propagating error corrections back through the network

Introduced by Rosenblatt in his work on neural networks.

Question 9

Q

What issue arises when initializing all weights in a neural network to zero?

Answer

A

All neurons produce the same output, leading to symmetry and ineffective learning

This problem prevents the network from detecting different features.

Question 10

Q

What did Rosenblatt suggest for updating weights in a neural network?

Answer

A

A stochastic process that introduces randomness to weight updates

This approach aimed to break symmetry in the network.

Question 11

Q

What was Hinton’s belief about the nature of neurons in neural networks?

Answer

A

Neurons had to be stochastic to ensure different learning outcomes

This belief was based on Rosenblatt’s argument about non-deterministic procedures.

Question 12

Q

What was Hinton’s experience in academia post-Ph.D.?

Answer

A

He faced rejection in the UK and eventually found a position in the US

This move was significant for his career in neural networks.

Question 13

Q

What is the gradient descent method?

Answer

A

A technique to minimize error by updating weights in the opposite direction of the error gradient

Used in training neural networks to find optimal weight values.

Question 14

Q

What is a major challenge with the error function in neural networks?

Answer

A

It is not convex and can have multiple local minima

This complexity makes finding the global minimum more difficult.

Question 15

Q

What phenomenon can occur with hill climbing algorithms?

Answer

A

The mesa phenomenon, where the algorithm gets stuck in flat regions of the error space

This can impede finding better solutions in optimization tasks.

Question 16

Q

What is the hill-climbing technique?

Answer

A

A method where performance must improve to a local optimum where no small change in controls yields improvement.

Question 17

Q

What phenomenon can hill climbing encounter according to Minsky and Selfridge?

Answer

A

The mesa phenomenon.

Question 18

Q

What is the mesa phenomenon?

Answer

A

A situation where small tweaks to parameters do not improve performance or lead to large performance changes.

Question 19

Q

What was Minsky and Papert’s view of multi-layer neural networks?

Answer

A

They had a dismal view, suggesting a deliberate sabotage of research into neural networks.

Question 20

Q

Who independently developed methods relevant to the backpropagation algorithm in 1960-61?

Answer

A

Henry J. Kelley and Arthur E. Bryson.

Question 21

Q

What contribution did Stuart Dreyfus make in 1962?

Answer

A

He derived formulas based on the chain rule to augment the Kelley-Bryson method.

Question 22

Q

Who demonstrated techniques for using stochastic gradient descent in 1967?

Answer

A

Shun’ichi Amari.

Question 23

Q

What did Seppo Linnainmaa develop in 1970?

Answer

A

The code for efficient backpropagation.

Question 24

Q

What was the title of Paul Werbos’s 1974 Ph.D. thesis?

Answer

A

Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences.

Question 25

Q

Who developed the modern version of the backpropagation algorithm in the early 1980s?

Answer

A

Rumelhart, Hinton, and Williams.

Question 26

Q

What does the equation y = wx + b represent?

Answer

A

The output of a neuron given a weight w, bias b, and scalar input x.

Question 27

Q

What is the delta rule used for?

Answer

A

Finding the weight and bias in a neuron.

Question 28

Q

What is the formula for calculating the error in the delta rule?

Answer

A

e = y - yhat.

Question 29

Q

What does loss represent in the context of the delta rule?

Answer

A

loss = (y - yhat)².

Question 30

Q

What does MSE stand for?

Answer

A

Mean Squared Error.

Question 31

Q

In the context of the delta rule, what is the learning rate denoted by?

Answer

A

α (alpha).

Question 32

Q

What happens to the weight and bias during the update process?

Answer

A

They are adjusted by a small fraction of the gradient.

Question 33

Q

What is a hyperplane in machine learning?

Answer

A

A line that separates different classes in a dataset.

Question 34

Q

What is the XOR problem in neural networks?

Answer

A

A classification problem where data points cannot be separated by a single linear line.

Question 35

Q

What is required to solve the XOR problem?

Answer

A

At least two layers of neurons.

Question 36

Q

What do the neurons in the first layer of a neural network for XOR do?

Answer

A

They find two lines to separate the data.

Question 37

Q

What is the output of a neuron that takes in two inputs x1 and x2?

Answer

A

y = w1x1 + w2x2 + b.

Question 38

Q

What is the purpose of the second layer in a neural network for XOR?

Answer

A

To create a weighted sum of the outputs of the first layer’s neurons.

Question 39

Q

What is the output of a simple linear neuron?

Answer

A

y = w1 * x1 + w2 * x2 + b

Question 40

Q

What is the role of the activation function in a neuron?

Answer

A

Transforms the weighted sum input into an output.

Question 41

Q

What is a threshold function?

Answer

A

A function that outputs 1 if z > 0 and 0 otherwise.

Question 42

Q

True or False: The threshold function is differentiable everywhere.

Question 43

Q

What is the sigmoid function used for?

Answer

A

To create a smooth, differentiable activation function.

Question 44

Q

As z tends to infinity, what does the sigmoid function approach?

Question 45

Q

As z tends to minus infinity, what does the sigmoid function approach?

Question 46

Q

What is the structure of a simple neural network for the XOR problem?

Answer

A

Three layers: input layer, hidden layer with two neurons, output layer with one neuron.

Question 47

Q

Fill in the blank: The output neuron takes a weighted sum of the outputs of the two hidden neurons and passes that through a _______.

Answer

A

sigmoid activation function

Question 48

Q

What is the loss function defined as in this context?

Answer

A

L = e^2, where e is the error (y - yhat).

Question 49

Q

What technique was developed for calculating partial derivatives in neural networks?

Answer

A

Backpropagation

Question 50

Q

Who were the key researchers in developing the backpropagation algorithm?

Answer

A

Werbos, Rumelhart, Hinton, Williams

Question 51

Q

What does backpropagation allow us to compute?

Answer

A

The gradients of the loss function with respect to weights and biases.

Question 52

Q

What is required for the chain rule to be applied in backpropagation?

Answer

A

Every operation must be differentiable.

Question 53

Q

What is the significance of breaking symmetry in neural networks?

Answer

A

To ensure that neurons learn different features and do not produce the same output.

Question 54

Q

How can symmetry be broken during the initialization of weights?

Answer

A

By setting initial weights to small random values.

Question 55

Q

What is the role of the output layer in a neural network for classifying digits?

Answer

A

It has one neuron for each digit class, firing the corresponding neuron for the detected digit.

Question 56

Q

What is a multi-layer perceptron?

Answer

A

A fully connected deep neural network.

Question 57

Q

What does a fully connected neural network mean?

Answer

A

Each neuron in a layer receives inputs from all neurons in the previous layer.

Question 58

Q

Fill in the blank: The first layer in a neural network for image recognition has _______ neurons, one for each pixel.

Question 59

Q

What type of activation function was initially used in binary threshold neurons?

Answer

A

Threshold activation function

Question 60

Q

What is the main advantage of using a sigmoid function over a threshold function?

Answer

A

It is differentiable everywhere.

Question 61

Q

What does the backpropagation algorithm enable networks to learn?

Answer

A

Interesting representations of data.

Question 62

Q

True or False: Neural networks require predefined features from the data.

Question 63

Q

What happens to the output of neurons in a well-trained network for digit recognition?

Answer

A

The correct digit neuron fires significantly more than others.

Question 64

Q

What is the primary challenge mentioned regarding complex networks with many layers?

Answer

A

Calculating partial derivatives becomes unrealistic.

Answer 60

A

Neural networks can learn to represent features internally without needing predefined features.

Answer 61

A

Nonlinear features such as [x1, x2, x1x2].

Answer 62

A

Hidden units represent important features of the task domain.

Answer 63

A

Backpropagation allows for the creation of useful new features automatically.

Answer 64

A

Rumelhart, Hinton, and Williams.

Answer 65

A

Rumelhart moved to Stanford University.

Answer 66

A

u = 1 + e^{-z}

Answer 67

A

An activation function.

Answer 68

A

a1 = σ(z1)

Answer 69

A

e = (y - ŷ)

Answer 70

A

To update the weights and biases in the neural network.

Answer 71

A

Any differentiable function.

Answer 72

A

The rate of change of the loss function with respect to weights and biases.

Answer 73

A

ŷ = σ(z4)

Answer 74

A

It must be differentiable.

Answer 75

A

It allows for more effective and flexible modeling of complex data.

Answer 76

A

backpropagation

Answer 77

A

A method used to update the weights based on the gradient.

Answer 78

A

Yann LeCun.

Answer 79

A

Pick’s disease.