LESSON 7 - Supervised learning 2 Flashcards

1
Q

What is the objective of the delta rule in supervised learning?

A

The objective of the delta rule in supervised learning is to quantify the error by computing the discrepancy between the actual output of the network (Y) and the desired response (T) for each neuron and each training sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the significance of the continuous output function in the delta rule?

A

The continuous output function in the delta rule is crucial for quantifying error, allowing the computation of an error signal. This signal represents the difference between the actual output and the desired response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the mathematical equivalence between the delta rule and the Rescorla-Wagner rule in psychology?

A

The delta rule in neural networks is mathematically equivalent to the Rescorla-Wagner rule in psychology, establishing a connection between neural networks and animal/human learning processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why are hidden layers necessary in neural networks, particularly in solving problems like XOR?

A

Hidden layers are essential in neural networks, specifically in solving problems like XOR, as they provide the network with the capability to learn non-linearly separable patterns, which is beyond the capability of networks with only input and output neurons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the role of non-linear activation functions in hidden neurons?

A

Non-linear activation functions in hidden neurons are crucial because they make networks with hidden layers more powerful. Without non-linear activation functions, a hidden layer’s effect would be similar to not having a hidden layer at all.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the architecture of a neural network for solving the XOR problem with a minimal setup?

A

For solving the XOR problem, a minimal neural network architecture includes two inputs, one output neuron, and one extra hidden neuron. This configuration is sufficient to learn and solve the XOR problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is error backpropagation, and why is it significant in training networks with hidden layers?

A

Error backpropagation is an algorithm used to train networks with hidden layers. It is significant because it addresses the challenge of computing an error signal for hidden neurons, which is not available in the delta rule due to the lack of a target state for hidden neurons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How are weights changed in the learning process, and what role does the learning rate play?

A

Weights are changed in the learning process through gradient descent, and the learning rate determines the size of these weight changes. The learning rate is essential for making adjustments – small rates ensure precision, while large rates may lead to instability and imprecision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between batch learning and online learning in the context of gradient descent?

A

In batch learning, the gradient descent is performed against the global error function using all training examples, whereas, in online learning, weights are changed after each single pattern, employing stochastic gradient descent, which computes the error function with respect to a single example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does momentum contribute to efficient learning, especially in escaping local minima?

A

Momentum is a learning parameter that adds a fraction of the previous weight update to the current update. It aids in escaping local minima by continuing the change in the previous direction, even when the gradient is 0 or very shallow due to a local minimum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is the use of momentum particularly helpful in the learning process?

A

Momentum is particularly helpful in the learning process as it allows the weights to keep changing when the gradient is 0 or very shallow, overcoming issues related to local minima and enabling smoother learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of adjusting the learning rate constantly during the learning process?

A

Constantly adjusting the learning rate during the learning process is essential to adapt to changing conditions. This variability ensures that the learning process is efficient, making larger changes initially and gradually decreasing the rate for more precise adjustments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the term “universal approximators” refer to in the context of neural networks?

A

Neural networks with non-linear activation functions in hidden layers are often referred to as “universal approximators.” This is because, with the right configuration of hidden neurons, they have the capacity to learn and approximate almost any problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does error backpropagation address the challenge of computing error signals for hidden neurons?

A

Error backpropagation addresses the challenge of computing error signals for hidden neurons by propagating errors computed for the outputs backward. This involves summing up errors from output neurons and computing new errors for hidden neurons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the significance of using non-linear activation functions in hidden neurons?

A

Non-linear activation functions in hidden neurons are significant because they enhance the overall power of neural networks. Without these functions, hidden layers would not contribute to the network’s ability to learn complex, non-linear patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is stochastic gradient descent preferred in online or mini-batch learning?

A

Stochastic gradient descent is preferred in online or mini-batch learning because it computes the error function with respect to a single pattern, making it less prone to getting stuck in local minima. It allows for more flexibility and adaptability during the learning process.

17
Q

How does momentum contribute to escaping local minima in the context of neural network training?

A

Momentum contributes to escaping local minima by continuing the change in the previous direction even when the gradient is 0 or very shallow. It helps weights keep changing, allowing the network to climb out of local minima and reach more optimal solutions.

18
Q

What is the key advantage of using networks with hidden layers, especially in solving complex problems?

A

The key advantage of using networks with hidden layers is their ability to learn and solve complex problems that are not linearly separable, providing a more sophisticated and powerful learning capability.

19
Q

How does the concept of “local minimum” relate to the challenges faced in standard gradient descent during neural network training?

A

The concept of “local minimum” poses a challenge in standard gradient descent as it may cause the algorithm to get stuck in suboptimal solutions. Standard gradient descent struggles to reach the global minimum and is less effective when encountering local minimum points in the error function.

20
Q

In the context of learning rate adjustment, why is a variable learning rate considered more effective than a fixed one?

A

A variable learning rate is considered more effective than a fixed one because it adapts to changing conditions during the learning process. Starting with a higher learning rate allows for larger changes initially, and as learning progresses, it decreases, ensuring more precise and stable adjustments.

21
Q

How does error backpropagation overcome the limitation of the delta rule in training networks with hidden layers?

A

Error backpropagation overcomes the limitation of the delta rule in training networks with hidden layers by propagating errors computed for the outputs backward to compute new errors for hidden neurons. This addresses the challenge of lacking an error signal for hidden neurons in the delta rule.