9 - The Man Who Set Back Machine Learning (Not Really) Flashcards

Question 1

Q

What is deep learning?

Answer

A

The process of training neural networks that have three or more layers (one input layer, one output layer, and one or more hidden layers).

Question 2

Q

Who is George Cybenko?

Answer

A

A professor of engineering at Dartmouth College known for his work on neural networks.

Question 3

Q

What significant event occurred in 2017 related to deep learning?

Answer

A

A summer school on deep learning in Bilbao, Spain, attended by nearly thirteen hundred people.

Question 4

Q

What is the universal approximation theorem?

Answer

A

A theorem that shows a neural network with just one hidden layer, given enough neurons, can approximate any function.

Question 5

Q

What did Cybenko’s landmark paper demonstrate?

Answer

A

It proved that a neural network with one hidden layer can approximate any function.

Question 6

Q

What did Minsky and Papert’s book ‘Perceptrons’ conclude?

Answer

A

Single-layer neural networks have limitations and multi-layer networks are likely useless.

Question 7

Q

What is backpropagation?

Answer

A

A training algorithm for neural networks that allows for the training of multi-layer networks.

Question 8

Q

Who were the authors of the seminal 1986 backpropagation paper?

Answer

A

David Rumelhart, Geoffrey Hinton, and Ronald Williams.

Question 9

Q

What is the role of a single-layer perceptron?

Answer

A

It takes input values, calculates weighted sums, and produces an output based on a thresholding function.

Question 10

Q

What is the perceptron training algorithm used for?

Answer

A

To train a single-layer neural network by finding optimal weights and biases.

Question 11

Q

What differentiates a deep neural network from a single-layer network?

Answer

A

A deep neural network has multiple weight matrices due to the presence of hidden layers.

Question 12

Q

True or False: The perceptron training algorithm can be used for networks with hidden layers.

Question 13

Q

What does training a neural network involve?

Answer

A

Finding optimal values for the weight matrices to approximate a desired function.

Question 14

Q

Fill in the blank: Cybenko’s work is often associated with _______.

Answer

A

delaying deep learning by twenty years.

Question 15

Q

What can a function represented by a neural network achieve?

Answer

A

It can represent a decision boundary or perform regression.

Question 16

Q

What is an activation function?

Answer

A

A function that determines the output of a neuron based on its input.

Question 17

Q

What is the sigmoid activation function?

Answer

A

A smooth function that transitions from almost 0 to almost 1.

Question 18

Q

What type of neuron did Cybenko use in his proof?

Answer

A

A nonlinear neuron based on the sigmoid activation function.

Question 19

Q

What is the significance of having multiple weight matrices in a network?

Answer

A

It allows the network to be classified as a deep neural network.

Question 20

Q

What does the term ‘AI winter’ refer to?

Answer

A

A period of reduced funding and interest in artificial intelligence research.

Question 21

Q

What is a hidden layer in a neural network?

Answer

A

A layer of neurons that is not directly exposed on the output side.

Question 22

Q

What is the output of a bipolar neuron?

Answer

A

+1 or -1.

Question 23

Q

What activation function is used in Cybenko’s neurons?

Answer

A

The sigmoid activation function, a(z) = σ(z)

The sigmoid function smoothly transitions from almost 0 to almost 1.

Question 24

Q

What does the equation z = wx + b represent in the context of a neuron?

Answer

A

The weighted sum of the inputs plus the bias term

Here, z is calculated using the weight vector w and bias b.

Question 25

Q

How can the shape and position of the sigmoid function be controlled?

Answer

A

By changing the values of w (weights) and b (bias)

This affects the steepness and midpoint of the sigmoid curve.

Question 26

Q

What is the output equation for the neuron?

Answer

A

y = s(z)

s(z) is the sigmoid activation function applied to z.

Question 27

Q

What is the significance of the output neuron in a hidden layer network?

Answer

A

It performs a linear combination of the outputs of the hidden neurons

This involves multiplying each output by a weight and summing them.

Question 28

Q

What is the main goal of Cybenko’s analysis on neural networks?

Answer

A

To prove that a summation of hidden neurons can approximate any desired function f(x)

This is dependent on having enough hidden neurons.

Question 29

Q

What does the term ‘one-dimensional version’ refer to in Cybenko’s analysis?

Answer

A

Both input and output vectors have only one element each

This simplifies the understanding of the network’s operations.

Question 30

Q

How does the output neuron generate an approximately rectangular output?

Answer

A

By performing a linear combination of the outputs of two hidden neurons

The weights used can be positive or negative, affecting the final shape.

Question 31

Q

What happens when the number of sigmoidal neurons is increased?

Answer

A

The approximation of the desired function improves

More neurons allow for better representation of complex functions.

Question 32

Q

What is the relationship between weights, biases, and the outputs of hidden neurons?

Answer

A

Weights and biases determine the shape and position of the hidden neuron outputs

This affects the final output through linear combinations.

Question 33

Q

What did Cybenko prove about neural networks in 1988?

Answer

A

That a network with two hidden layers can approximate any function

He believed it should also be possible with just one hidden layer.

Question 34

Q

What is the significance of considering functions as vectors?

Answer

A

It allows us to analyze functions in higher-dimensional spaces

Functions can be represented as points in infinite-dimensional spaces.

Question 35

Q

What does the term ‘vector space’ refer to?

Answer

A

A collection of vectors, matrices, and functions that satisfy certain properties

These objects can be manipulated mathematically.

Question 36

Q

What was Cybenko’s proof method?

Answer

A

Proof by contradiction

He assumed that a neural network could not approximate all functions and showed this assumption led to a contradiction.

Question 37

Q

What misconception did Cybenko’s proof create among researchers?

Answer

A

That only one hidden layer was necessary for neural networks

This led to a focus on shallow networks, delaying the deep learning revolution.

Question 38

Q

What factors contributed to the revolution in deep learning around 2010?

Answer

A

Increased number of hidden layers, massive training data, and computing power

These elements were not widely available in the 1990s.

Question 39

Q

Fill in the blank: Cybenko believed a neural network with one hidden layer could approximate _______.

Answer

A

any function

This belief was significant for the development of neural network theory.

Question 40

Q

What significant development in deep learning occurred in 2010?

Answer

A

Researchers began to increase the number of hidden layers in neural networks beyond one.

This marked a pivotal shift in the capabilities of deep learning.

Question 41

Q

What was Cybenko’s contribution to neural networks?

Answer

A

Cybenko proved the approximating properties of neural networks in 1989.

His work laid the groundwork for future advancements in deep learning.

Question 42

Q

What two key ingredients were necessary for the deep learning revolution to take off in the 2010s?

Answer

A

Massive amounts of training data
Computing power

These factors were not available in the 1990s.

Question 43

Q

What did Cybenko speculate about the number of neurons required for approximation?

Answer

A

He speculated that astronomical numbers of terms would be required for most approximation problems.

This was influenced by the curse of dimensionality in multidimensional approximation theory.

Question 44

Q

What is the curse of dimensionality?

Answer

A

The curse of dimensionality refers to the phenomenon where the feature space becomes increasingly sparse as dimensionality increases, making it difficult for models to generalize.

This concept poses challenges in statistics and approximation theory.

Question 45

Q

How do deep neural networks challenge traditional machine learning theories?

Answer

A

They are not as susceptible to the curse of dimensionality and do not overfit the data as expected despite having massive parameters.

This challenges existing beliefs about model complexity and data fitting.

Question 46

Q

What algorithm allowed researchers to start training deep neural networks?

Answer

A

Backpropagation.

This algorithm is essential for optimizing the weights in neural networks during training.