LESSON 6 - Supervised learning 1 Flashcards
Why is it crucial for a model in supervised learning to be able to generalize to new data?
The model’s success is determined not only by its performance on the training set but also its ability to generalize to a new test set. Generalization ensures the model can accurately describe new, unseen data.
What is the significance of allowing some outliers in a model, particularly in the context of EEG or FMRI data?
It is preferable to have some outliers than to memorize data without the ability to generalize. Outliers can be considered acceptable errors, such as sweating in EEG or movement in FMRI data.
How is overfitting monitored in supervised learning, and what role does the parameter ‘patience’ play?
Overfitting is monitored by observing the test error, which may worsen even if the overall error function is decreasing. The ‘patience’ parameter helps determine when to stop training, aiming for a balance between good error function and a small generalization gap.
What are the two major problems in supervised learning, and how do they relate to categorizing?
The two major problems in supervised learning are categorizing and training vs. test performance. Categorizing involves providing input and desired output, aiming to generalize this information for new instances. Training focuses on checking if everything is going well, while testing is more complex and requires validation for generalization.
How does supervised learning handle binary output, and what example is given for classification problems?
In supervised learning, binary output involves associating input data with labels, such as classifying cats as 0 and dogs as 1. An example using height and weight as features is provided for visualization.
What is the purpose of linear classifiers in supervised learning, and what potential problems do they face?
Linear classifiers, exemplified by perceptrons, aim to separate classes using a line. Potential problems include the need for multiple examples to find a reliable solution and the challenge when classes are not linearly separable, requiring more powerful algorithms.
What are the three types of learning mentioned in the context of associative learning?
The three types of learning mentioned in associative learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves providing correct outputs during training, unsupervised learning focuses on building representations without associated outputs, and reinforcement learning maximizes rewards or minimizes punishments.
What is the significance of the delta rule in the context of learning algorithms for artificial neural networks?
The delta rule describes learning in a neural network in terms of gradient descent over the error function. It plays a fundamental role in minimizing the error function, and the idea is used in various algorithms for training neural networks.
What is the purpose of the perceptron model, and how does it handle output in terms of classification?
The perceptron model is designed to function as a linear classifier. It uses a bipolar output (-1 or +1) and classifies input examples based on a threshold, allowing it to decide between classes like cat (0) and dog (1).
How does the perceptron model address the issue of adjusting the threshold, and what is the role of the bias weight?
The perceptron substitutes the threshold with a bias weight, represented by W0, which acts like a threshold. The bias weight is an additional connection weight with a constant value of 1, offering a solution to adjusting the threshold.
What is the delta rule, and how is it applied in the context of the perceptron model?
The delta rule is a learning rule applied in the perceptron model to adjust weights during training. It computes weight changes based on the difference between the target output and the actual output, multiplied by the presynaptic activity (input value).
What is the mathematical foundation behind the guarantee that the perceptron model will work, and what kind of problems can it effectively solve?
Mathematically, the perceptron model is guaranteed to work if there is a linear solution to the problem. It is effective for linear problems; however, it struggles with non-linear separations in classification problems.
What are the potential problems associated with the perceptron model, especially in dealing with non-linearly separable classes?
The perceptron model faces issues with non-linearly separable classes, where many possible lines can separate them. It might misclassify examples, especially when encountering outliers or noisy data, making it less reliable for generalization.
How does the OR problem illustrate the limitations of the perceptron model, and what is its relation to linear separability?
The OR problem involves binary input variables and outputs, showcasing the limitations of the perceptron model. While the perceptron works for linearly separable problems, the OR problem reveals that it struggles when inputs are not linearly separable.
What is the key difference between the XOR problem and the OR problem, and how does it highlight the perceptron model’s limitations?
The XOR problem differs from the OR problem in that it outputs 0 when both inputs are active. This non-linearly separable condition exposes the perceptron model’s inability to handle more complex patterns.