Regularization Methods Flashcards
Why do we need Regularizers?
What is Ridge Regression?
Ridge regression is a regularization technique for least square methods, in which a penalty is added to the least squares to introduce some bias. The penalty added is multiple of squares of the coefficients.
Why do we use ridge regression?
To reduce overfitting, especially in there is multicollinearity in data.
What is the equation for ridge regression?
LS + lambda*sum(Squares of coefficients)
What is an added advantage of Ridge regression?
Ridge regression makes it possible to determine the coefficients with less number of datapoints.
Usually, n parametered equation will need n+1 datapoints to determine them. But with ridge regression, u can do it with very less amount of data.
What is the effect of Lambda in ridge regression?
If lambda is 0, then there is no penalty added.
If lambda is higher, more penalty is added, shrinking the coefficients, reducing the model complexity and overfitting.
How do we find lambda?
Using cross validation
What is the range of lambda?
0 to +infinity
What is Lasso Regression?
It is same as Ridge Regression, but instead of L2 penalty it adds L1 penalty, i.e. absolute value of coefficients, and it has the ability to reduce the coefficients all the way to 0.
Difference between ridge and lasso other than terms?
Ridge regression can shrink the coefficients towards 0 but not to 0.
Lasso can shrink the coefficients to 0 as well.
What is the advantage of Lasso over Ridge?
Lasso allows to discard parameters that have no significance to the model.
When should we use Lasso and Ridge?
When there are alot of parameters, but only few are useful, lasso can eliminate these variables.
If most of the variables are useful, then it is better to use ridge regression.
What is Elastic Net regression?
It is the combination of L1 and L2 terms
What is the advantage of Elastic Net?
Lasso tends to pick one of the correlated terms in correlated group and elimate others.
Ridge tends to shrink the correlated parameters together.
ElasticNet
1) select groups of correlated variables together
2)ElasticNet is particularly useful when dealing with datasets with high dimensionality and multicollinearity, where both feature selection and coefficient shrinkage are desired