LESSON 6 - Supervised learning 1 Flashcards

Question 1

Q

Why is it crucial for a model in supervised learning to be able to generalize to new data?

Answer

A

The model’s success is determined not only by its performance on the training set but also its ability to generalize to a new test set. Generalization ensures the model can accurately describe new, unseen data.

Question 2

Q

What is the significance of allowing some outliers in a model, particularly in the context of EEG or FMRI data?

Answer

A

It is preferable to have some outliers than to memorize data without the ability to generalize. Outliers can be considered acceptable errors, such as sweating in EEG or movement in FMRI data.

Question 3

Q

How is overfitting monitored in supervised learning, and what role does the parameter ‘patience’ play?

Answer

A

Overfitting is monitored by observing the test error, which may worsen even if the overall error function is decreasing. The ‘patience’ parameter helps determine when to stop training, aiming for a balance between good error function and a small generalization gap.

Question 4

Q

What are the two major problems in supervised learning, and how do they relate to categorizing?

Answer

A

The two major problems in supervised learning are categorizing and training vs. test performance. Categorizing involves providing input and desired output, aiming to generalize this information for new instances. Training focuses on checking if everything is going well, while testing is more complex and requires validation for generalization.

Question 5

Q

How does supervised learning handle binary output, and what example is given for classification problems?

Answer

A

In supervised learning, binary output involves associating input data with labels, such as classifying cats as 0 and dogs as 1. An example using height and weight as features is provided for visualization.

Question 6

Q

What is the purpose of linear classifiers in supervised learning, and what potential problems do they face?

Answer

A

Linear classifiers, exemplified by perceptrons, aim to separate classes using a line. Potential problems include the need for multiple examples to find a reliable solution and the challenge when classes are not linearly separable, requiring more powerful algorithms.

Question 7

Q

What are the three types of learning mentioned in the context of associative learning?

Answer

A

The three types of learning mentioned in associative learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves providing correct outputs during training, unsupervised learning focuses on building representations without associated outputs, and reinforcement learning maximizes rewards or minimizes punishments.

Question 8

Q

What is the significance of the delta rule in the context of learning algorithms for artificial neural networks?

Answer

A

The delta rule describes learning in a neural network in terms of gradient descent over the error function. It plays a fundamental role in minimizing the error function, and the idea is used in various algorithms for training neural networks.

Question 9

Q

What is the purpose of the perceptron model, and how does it handle output in terms of classification?

Answer

A

The perceptron model is designed to function as a linear classifier. It uses a bipolar output (-1 or +1) and classifies input examples based on a threshold, allowing it to decide between classes like cat (0) and dog (1).

Question 10

Q

How does the perceptron model address the issue of adjusting the threshold, and what is the role of the bias weight?

Answer

A

The perceptron substitutes the threshold with a bias weight, represented by W0, which acts like a threshold. The bias weight is an additional connection weight with a constant value of 1, offering a solution to adjusting the threshold.

Question 11

Q

What is the delta rule, and how is it applied in the context of the perceptron model?

Answer

A

The delta rule is a learning rule applied in the perceptron model to adjust weights during training. It computes weight changes based on the difference between the target output and the actual output, multiplied by the presynaptic activity (input value).

Question 12

Q

What is the mathematical foundation behind the guarantee that the perceptron model will work, and what kind of problems can it effectively solve?

Answer

A

Mathematically, the perceptron model is guaranteed to work if there is a linear solution to the problem. It is effective for linear problems; however, it struggles with non-linear separations in classification problems.

Question 13

Q

What are the potential problems associated with the perceptron model, especially in dealing with non-linearly separable classes?

Answer

A

The perceptron model faces issues with non-linearly separable classes, where many possible lines can separate them. It might misclassify examples, especially when encountering outliers or noisy data, making it less reliable for generalization.

Question 14

Q

How does the OR problem illustrate the limitations of the perceptron model, and what is its relation to linear separability?

Answer

A

The OR problem involves binary input variables and outputs, showcasing the limitations of the perceptron model. While the perceptron works for linearly separable problems, the OR problem reveals that it struggles when inputs are not linearly separable.

Question 15

Q

What is the key difference between the XOR problem and the OR problem, and how does it highlight the perceptron model’s limitations?

Answer

A

The XOR problem differs from the OR problem in that it outputs 0 when both inputs are active. This non-linearly separable condition exposes the perceptron model’s inability to handle more complex patterns.

Question 16

Q

How does the delta rule contribute to learning in a neural network, and what does it seek to minimize?

Answer

A

The delta rule contributes to learning by minimizing the error function through gradient descent. It aims to minimize the square error between the output and target values for each output neuron, seeking to reduce the global error of the network.

Question 17

Q

What does the graphical representation of gradient descent illustrate, and how does it help in finding an optimal solution?

Answer

A

The graphical representation of gradient descent depicts a parabolic function of errors associated with different weight combinations. By taking steps opposite to the gradient, the algorithm seeks an optimal weight combination that minimizes the error, resembling a descent down a metaphorical mountain.

Question 18

Q

How does the gradient of the error function guide the weight adjustment process in the context of gradient descent?

Answer

A

The gradient of the error function acts as the tangent to the error function at a specific point. To adjust weights, the algorithm takes steps in the opposite direction of the gradient, utilizing the negative sign to move toward the global minimum of the error function.

Question 19

Q

What is the role of the ‘patience’ parameter in monitoring overfitting during training, and why is it necessary?

Answer

A

The ‘patience’ parameter helps in determining when to stop training by observing changes in the test error. It is crucial to strike a balance between achieving a good error function and minimizing the generalization gap, ensuring that the model generalizes well to new data.

Question 20

Q

In supervised learning, what is the distinction between training and test performance, and why is this differentiation necessary?

Answer

A

Training in supervised learning focuses on ensuring that the model learns from the provided data. Test performance is more complex, involving validation and assessing how well the model generalizes to new, unseen data. This differentiation ensures the model’s effectiveness beyond the training set.

Question 21

Q

How does supervised learning handle binary output, and what example is given for classification problems?

Answer

A

In supervised learning, binary output involves associating input data with labels, such as classifying cats as 0 and dogs as 1. An example using height and weight as features is provided for visualization.

Question 22

Q

What are the three types of learning mentioned in the context of associative learning?

Answer

A

In the context of associative learning, three types of learning are discussed: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves providing the network with input and desired output pairs, unsupervised learning focuses on building representations without associated outputs, and reinforcement learning is driven by external signals in the form of rewards and punishments to maximize rewards.