Week 6 Flashcards
Overfitting
When fitting the observed facts, the data seen so far, well, does no longer indicate a small out-of-sample error.
Deterministic noise
The part of the target function that is outside of the best approximation to the target function.
Stochastic noise
Random noise that cannot be modeled.
State two differences between deterministic and stochastic noise
1) If we generate the same data again, the deterministic noise would be the same but the stochastic noise would be different.
2) Different models capture different parts of the target function -> deterministic noise depends on the learning model you use.
The variance of the stochastic noise is captured by the variable…
sigma_squared
What is the cause of overfitting?
Noise
Name two cures for overfitting:
1) Regularization
2) Validation
Regularization
Attempts to minimize Eout by working through the equation
Eout(h) = Ein(h) + overfit penalty
Validation
Estimates the out-of-sample error directly
validation set
A subset from the data that is not used in training.
When is a set no longer a test set?
When it affects the learning process in any way.
How is the validation set created?
The data set D is divided in a training set of size (N-K) and a validation set of size K. A final hypothesis is learned by the algorithm using the training set. Then the validation error is calculated with the validation set.
What is the rule of thumb for determining K in validation?
K = N/5
Use 80% for training and 20% for validation.
Cross validation estimate
The average value of the error made by gn on its validation set.
Wat denoteert H.theta?
De polynomen van graad d (~erboven)