Error-Based Learning Flashcards
What defines a parameterized model?
A parameterized model is defined by a collection of parameters, with specific parameter combinations yielding optimal model performance.
What is the difference between parameters and hyper-parameters in a model?
Hyper-parameters are set before training begins. Parameters are learned during training.
How are model parameters adapted or fit?
Model parameters are adapted based on the feature distribution in a training set.
What guides the model fitting process?
Model fitting is guided by an error function that evaluates the model’s error rate against the ground truth.
What indicates model convergence in the context of an error function?
Model convergence is indicated by an error rate approaching a lower limit, ideally close to zero.
Why does the L2 function (sum of squared error) use the sum of squared errors rather than just the sum of errors?
To ensure that positive and negative errors don’t cancel each other out.
What does each combination of weights w[0] and w[1] correspond to on an error surface?
Each combination corresponds to a sum of squared errors value, defining a point on the error surface above the x-y plane in weight space.
What is weight space in the context of modelling error surfaces?
Weight space is the x-y plane defined by possible combinations of weights w[0] and w[1].
What does the error surface represent in a model’s weight space?
The error surface represents the sum of squared errors for each combination of weights, with height indicating the error value.
Where is the model that best fits the training data located on the error surface?
It’s located at the lowest point on the error surface, which corresponds to the minimum sum of squared errors.
What two key properties of error surfaces help in finding the optimal combination of weights?
Error surface are convex (bowl-shaped) and have a global minimum, making it easier to locate the optimal weights.
Why are error surfaces for linear models typically convex with a global minimum?
The convex shape is determined by the linearity of the model, not by the properties of the data.
What is the method called for finding the best set of weights by minimizing the error surface?
Least squares optimization.
What is gradient descent?
It is an algorithm that uses a guided search from a random starting position to iteratively move toward the global minimum of the error surface.
How does gradient descent work?
Gradient descent uses the slope of the error surface to take small steps in the direction that reduces error, moving closer to the global minimum with each step.