Complexity/Selection/Regularization Flashcards
What is the bias-variance decomposition?
E[(Y - f̂_D(x))^2] = (Bias^2) + (Variance)
What does high bias in a model indicate?
A bias towards a particular kind of solution (e.g., linear model), also known as inductive bias.
What does high variance in a model indicate?
: The estimated model changes significantly when trained on different data sets, indicating overfitting.
What is the VC dimension?
The maximum number of points that can be correctly classified by at least one member of a set of classifiers.
What is the VC dimension of a linear classifier on R^p?
VC = p + 1
How are Degrees of Freedom (DF) defined for an estimate ŷ = f̂(X)?
Back: df(ŷ) = (1/σ^2) Σ cov(ŷ_i, y_i) = (1/σ^2) tr(cov(ŷ, y))
What is an intuition/interpretation of degrees of freedom in model complexity defined for an estimate ŷ = f̂(X)??
Degrees of freedom in model complexity represent the number of independent parameters that can be adjusted to fit the data, influencing the model’s flexibility.
Relationship between complexity, bias, variance and total error
The higher the model complexity, the lower the bias, the higher the variance and total error does a convex in between.
What does a good bias require?
Domain knowledge.
What is the relationship between degrees of freedom, number of samples, number of features, and λλ in ridge regression?
In ridge regression, the degrees of freedom are influenced by the number of features and the penalty parameter λλ. As λλ increases, the degrees of freedom decrease, reducing model complexity and helping to prevent overfitting, especially when the number of samples is limited relative to the number of features.
What is the PRESS statistic?
Predicted Residual Error Sum of Squares: PRESS = Σ(y_i - ŷ_-i)^2, where ŷ_-i is the prediction for the i-th sample when the model is estimated on all but the i-th sample.
Higher model complexity effect on performance? What to do?
It will always lead to better fit on training data but not on test data therefore we need to select a model by estimating its performance for train and validation/test data.
What is a method for validation?
Cross validation: estimate generalization error on different train/test splits.
What is Leave-one-out Cross-Validation (LOO-CV)?
A method where the model is trained on all but one sample and tested on the left-out sample, repeated for all samples. The average prediction error is reported.
What is k-fold Cross-Validation?
A method where data is split into k subsets, the model is trained on k-1 subsets and tested on the remaining one, repeated k times. Often k=5 or k=10 is used in practice.