9 - The Man Who Set Back Machine Learning (Not Really) Flashcards
What is deep learning?
The process of training neural networks that have three or more layers (one input layer, one output layer, and one or more hidden layers).
Who is George Cybenko?
A professor of engineering at Dartmouth College known for his work on neural networks.
What significant event occurred in 2017 related to deep learning?
A summer school on deep learning in Bilbao, Spain, attended by nearly thirteen hundred people.
What is the universal approximation theorem?
A theorem that shows a neural network with just one hidden layer, given enough neurons, can approximate any function.
What did Cybenko’s landmark paper demonstrate?
It proved that a neural network with one hidden layer can approximate any function.
What did Minsky and Papert’s book ‘Perceptrons’ conclude?
Single-layer neural networks have limitations and multi-layer networks are likely useless.
What is backpropagation?
A training algorithm for neural networks that allows for the training of multi-layer networks.
Who were the authors of the seminal 1986 backpropagation paper?
David Rumelhart, Geoffrey Hinton, and Ronald Williams.
What is the role of a single-layer perceptron?
It takes input values, calculates weighted sums, and produces an output based on a thresholding function.
What is the perceptron training algorithm used for?
To train a single-layer neural network by finding optimal weights and biases.
What differentiates a deep neural network from a single-layer network?
A deep neural network has multiple weight matrices due to the presence of hidden layers.
True or False: The perceptron training algorithm can be used for networks with hidden layers.
False.
What does training a neural network involve?
Finding optimal values for the weight matrices to approximate a desired function.
Fill in the blank: Cybenko’s work is often associated with _______.
delaying deep learning by twenty years.
What can a function represented by a neural network achieve?
It can represent a decision boundary or perform regression.
What is an activation function?
A function that determines the output of a neuron based on its input.
What is the sigmoid activation function?
A smooth function that transitions from almost 0 to almost 1.
What type of neuron did Cybenko use in his proof?
A nonlinear neuron based on the sigmoid activation function.
What is the significance of having multiple weight matrices in a network?
It allows the network to be classified as a deep neural network.
What does the term ‘AI winter’ refer to?
A period of reduced funding and interest in artificial intelligence research.
What is a hidden layer in a neural network?
A layer of neurons that is not directly exposed on the output side.
What is the output of a bipolar neuron?
+1 or -1.
What activation function is used in Cybenko’s neurons?
The sigmoid activation function, a(z) = σ(z)
The sigmoid function smoothly transitions from almost 0 to almost 1.
What does the equation z = wx + b represent in the context of a neuron?
The weighted sum of the inputs plus the bias term
Here, z is calculated using the weight vector w and bias b.
How can the shape and position of the sigmoid function be controlled?
By changing the values of w (weights) and b (bias)
This affects the steepness and midpoint of the sigmoid curve.
What is the output equation for the neuron?
y = s(z)
s(z) is the sigmoid activation function applied to z.
What is the significance of the output neuron in a hidden layer network?
It performs a linear combination of the outputs of the hidden neurons
This involves multiplying each output by a weight and summing them.
What is the main goal of Cybenko’s analysis on neural networks?
To prove that a summation of hidden neurons can approximate any desired function f(x)
This is dependent on having enough hidden neurons.
What does the term ‘one-dimensional version’ refer to in Cybenko’s analysis?
Both input and output vectors have only one element each
This simplifies the understanding of the network’s operations.
How does the output neuron generate an approximately rectangular output?
By performing a linear combination of the outputs of two hidden neurons
The weights used can be positive or negative, affecting the final shape.
What happens when the number of sigmoidal neurons is increased?
The approximation of the desired function improves
More neurons allow for better representation of complex functions.
What is the relationship between weights, biases, and the outputs of hidden neurons?
Weights and biases determine the shape and position of the hidden neuron outputs
This affects the final output through linear combinations.
What did Cybenko prove about neural networks in 1988?
That a network with two hidden layers can approximate any function
He believed it should also be possible with just one hidden layer.
What is the significance of considering functions as vectors?
It allows us to analyze functions in higher-dimensional spaces
Functions can be represented as points in infinite-dimensional spaces.
What does the term ‘vector space’ refer to?
A collection of vectors, matrices, and functions that satisfy certain properties
These objects can be manipulated mathematically.
What was Cybenko’s proof method?
Proof by contradiction
He assumed that a neural network could not approximate all functions and showed this assumption led to a contradiction.
What misconception did Cybenko’s proof create among researchers?
That only one hidden layer was necessary for neural networks
This led to a focus on shallow networks, delaying the deep learning revolution.
What factors contributed to the revolution in deep learning around 2010?
Increased number of hidden layers, massive training data, and computing power
These elements were not widely available in the 1990s.
Fill in the blank: Cybenko believed a neural network with one hidden layer could approximate _______.
any function
This belief was significant for the development of neural network theory.
What significant development in deep learning occurred in 2010?
Researchers began to increase the number of hidden layers in neural networks beyond one.
This marked a pivotal shift in the capabilities of deep learning.
What was Cybenko’s contribution to neural networks?
Cybenko proved the approximating properties of neural networks in 1989.
His work laid the groundwork for future advancements in deep learning.
What two key ingredients were necessary for the deep learning revolution to take off in the 2010s?
- Massive amounts of training data
- Computing power
These factors were not available in the 1990s.
What did Cybenko speculate about the number of neurons required for approximation?
He speculated that astronomical numbers of terms would be required for most approximation problems.
This was influenced by the curse of dimensionality in multidimensional approximation theory.
What is the curse of dimensionality?
The curse of dimensionality refers to the phenomenon where the feature space becomes increasingly sparse as dimensionality increases, making it difficult for models to generalize.
This concept poses challenges in statistics and approximation theory.
How do deep neural networks challenge traditional machine learning theories?
They are not as susceptible to the curse of dimensionality and do not overfit the data as expected despite having massive parameters.
This challenges existing beliefs about model complexity and data fitting.
What algorithm allowed researchers to start training deep neural networks?
Backpropagation.
This algorithm is essential for optimizing the weights in neural networks during training.