9 - The Man Who Set Back Machine Learning (Not Really) Flashcards

1
Q

What is deep learning?

A

The process of training neural networks that have three or more layers (one input layer, one output layer, and one or more hidden layers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Who is George Cybenko?

A

A professor of engineering at Dartmouth College known for his work on neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What significant event occurred in 2017 related to deep learning?

A

A summer school on deep learning in Bilbao, Spain, attended by nearly thirteen hundred people.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the universal approximation theorem?

A

A theorem that shows a neural network with just one hidden layer, given enough neurons, can approximate any function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What did Cybenko’s landmark paper demonstrate?

A

It proved that a neural network with one hidden layer can approximate any function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What did Minsky and Papert’s book ‘Perceptrons’ conclude?

A

Single-layer neural networks have limitations and multi-layer networks are likely useless.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is backpropagation?

A

A training algorithm for neural networks that allows for the training of multi-layer networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Who were the authors of the seminal 1986 backpropagation paper?

A

David Rumelhart, Geoffrey Hinton, and Ronald Williams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the role of a single-layer perceptron?

A

It takes input values, calculates weighted sums, and produces an output based on a thresholding function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the perceptron training algorithm used for?

A

To train a single-layer neural network by finding optimal weights and biases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What differentiates a deep neural network from a single-layer network?

A

A deep neural network has multiple weight matrices due to the presence of hidden layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

True or False: The perceptron training algorithm can be used for networks with hidden layers.

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does training a neural network involve?

A

Finding optimal values for the weight matrices to approximate a desired function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Fill in the blank: Cybenko’s work is often associated with _______.

A

delaying deep learning by twenty years.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What can a function represented by a neural network achieve?

A

It can represent a decision boundary or perform regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an activation function?

A

A function that determines the output of a neuron based on its input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the sigmoid activation function?

A

A smooth function that transitions from almost 0 to almost 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What type of neuron did Cybenko use in his proof?

A

A nonlinear neuron based on the sigmoid activation function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the significance of having multiple weight matrices in a network?

A

It allows the network to be classified as a deep neural network.

20
Q

What does the term ‘AI winter’ refer to?

A

A period of reduced funding and interest in artificial intelligence research.

21
Q

What is a hidden layer in a neural network?

A

A layer of neurons that is not directly exposed on the output side.

22
Q

What is the output of a bipolar neuron?

23
Q

What activation function is used in Cybenko’s neurons?

A

The sigmoid activation function, a(z) = σ(z)

The sigmoid function smoothly transitions from almost 0 to almost 1.

24
Q

What does the equation z = wx + b represent in the context of a neuron?

A

The weighted sum of the inputs plus the bias term

Here, z is calculated using the weight vector w and bias b.

25
Q

How can the shape and position of the sigmoid function be controlled?

A

By changing the values of w (weights) and b (bias)

This affects the steepness and midpoint of the sigmoid curve.

26
Q

What is the output equation for the neuron?

A

y = s(z)

s(z) is the sigmoid activation function applied to z.

27
Q

What is the significance of the output neuron in a hidden layer network?

A

It performs a linear combination of the outputs of the hidden neurons

This involves multiplying each output by a weight and summing them.

28
Q

What is the main goal of Cybenko’s analysis on neural networks?

A

To prove that a summation of hidden neurons can approximate any desired function f(x)

This is dependent on having enough hidden neurons.

29
Q

What does the term ‘one-dimensional version’ refer to in Cybenko’s analysis?

A

Both input and output vectors have only one element each

This simplifies the understanding of the network’s operations.

30
Q

How does the output neuron generate an approximately rectangular output?

A

By performing a linear combination of the outputs of two hidden neurons

The weights used can be positive or negative, affecting the final shape.

31
Q

What happens when the number of sigmoidal neurons is increased?

A

The approximation of the desired function improves

More neurons allow for better representation of complex functions.

32
Q

What is the relationship between weights, biases, and the outputs of hidden neurons?

A

Weights and biases determine the shape and position of the hidden neuron outputs

This affects the final output through linear combinations.

33
Q

What did Cybenko prove about neural networks in 1988?

A

That a network with two hidden layers can approximate any function

He believed it should also be possible with just one hidden layer.

34
Q

What is the significance of considering functions as vectors?

A

It allows us to analyze functions in higher-dimensional spaces

Functions can be represented as points in infinite-dimensional spaces.

35
Q

What does the term ‘vector space’ refer to?

A

A collection of vectors, matrices, and functions that satisfy certain properties

These objects can be manipulated mathematically.

36
Q

What was Cybenko’s proof method?

A

Proof by contradiction

He assumed that a neural network could not approximate all functions and showed this assumption led to a contradiction.

37
Q

What misconception did Cybenko’s proof create among researchers?

A

That only one hidden layer was necessary for neural networks

This led to a focus on shallow networks, delaying the deep learning revolution.

38
Q

What factors contributed to the revolution in deep learning around 2010?

A

Increased number of hidden layers, massive training data, and computing power

These elements were not widely available in the 1990s.

39
Q

Fill in the blank: Cybenko believed a neural network with one hidden layer could approximate _______.

A

any function

This belief was significant for the development of neural network theory.

40
Q

What significant development in deep learning occurred in 2010?

A

Researchers began to increase the number of hidden layers in neural networks beyond one.

This marked a pivotal shift in the capabilities of deep learning.

41
Q

What was Cybenko’s contribution to neural networks?

A

Cybenko proved the approximating properties of neural networks in 1989.

His work laid the groundwork for future advancements in deep learning.

42
Q

What two key ingredients were necessary for the deep learning revolution to take off in the 2010s?

A
  • Massive amounts of training data
  • Computing power

These factors were not available in the 1990s.

43
Q

What did Cybenko speculate about the number of neurons required for approximation?

A

He speculated that astronomical numbers of terms would be required for most approximation problems.

This was influenced by the curse of dimensionality in multidimensional approximation theory.

44
Q

What is the curse of dimensionality?

A

The curse of dimensionality refers to the phenomenon where the feature space becomes increasingly sparse as dimensionality increases, making it difficult for models to generalize.

This concept poses challenges in statistics and approximation theory.

45
Q

How do deep neural networks challenge traditional machine learning theories?

A

They are not as susceptible to the curse of dimensionality and do not overfit the data as expected despite having massive parameters.

This challenges existing beliefs about model complexity and data fitting.

46
Q

What algorithm allowed researchers to start training deep neural networks?

A

Backpropagation.

This algorithm is essential for optimizing the weights in neural networks during training.