Linear Model Selection & Regularization Flashcards

Question 1

Q

Feature Selection for Linear Regression

Answer

A

Best subset selection, try combinations of all features (not feasible, just a theoretical optima). *Unless you have a really long time is ok
Forward Stepwise Selection, start with no var, keep adding one var at a time that maximizes fit. Then select the best of these models using CV error.
Backward Stepwise Selection - start with all vars, keep taking them away one by one. “” CV Error.
Regularization Techniques (Ridge, Lasso)
Feature Selection - Lasso even though is a regularization technique, does Feature selection

Question 2

Q

Ways of selecting a model when you have tried a bunch of them

Answer

A

Use CV (Preferred)
Look at metrics such as AIC, BIC, etc or Adjusted Rsquared, which are goodness of fit metrics that pay a price for unnecessary variables. *Used these when computing power was in issue, nowadays use CV.

Question 3

Q

Regularization Techniques for Linear Regression

Answer

A

Ridge - supply a regularization parameter that penalizes large coefficients (so better have a good one). Must be careful in selecting the regularization parameter. Also must center/scale variables prior to regularization.
Lasso - regularizes some coefficients to exactly zero so also performs feature selection. (preferred). Must also carefully select reg. parameter and scale/center vars.

Question 4

Q

Disadvantage of Ridge Regression

Answer

A

Will include all variables P in the model, albeit will shrink coefficients of variables. This creates problems for model interpretation. Alternatively, Lasso will set some coefficients exactly to zero.

Question 5

Q

Ridge vs. Lasso

Answer

A

Lasso is better when many of the vars really have no relation (zero coeff) with the outcome var.
Ridge will outperform if many of the vars infact are related to the outcome var.

Question 6

Q

How to select tuning parameter

Answer

A

plot the K-fold CV Error against different values in the tuning parameter

Question 7

Q

Dimension Reduction vs. Regularization

Answer

A

Regularization involves controlling the coefficients (with lasso, even setting them to zero which has the effect of reducing dimensions). Dimension Reduction transform the predictors into a smaller set of variables.

Question 8

Q

Dimension Reduction

Answer

A

Transform dimensions into a smaller subset of linear combinations that substantially explain the variability of the all the original variables.

Question 9

Q

Uses of PCA

Answer

A

Can be used for dimension reduction or unsupervised learning. Creates linear combinations of predictors that are uncorrelated w/eachother.

Question 10

Q

Things to consider when doing PCA

Answer

A

center/scale prior to using PCA

Must select the # of Prin Components by plotting CV Error vs. # of components.

Question 11

Q

PCA vs Partial Least Squares

Answer

A

PCA creates new vars that substantially explain the variability in the orignal vars.
PLS creates new vars that explain variability, but ALSO that are related to the response, kind of like a supervised PCA.
Like PCA, in PLS you must choose your # of features and center/scale. PLS IS NOT SUPERIOR TO PCA they are kind of a wash.

Question 12

Q

What is high dimensionality

Answer

A

when p >= n. Linear and logistic regression doesn’t work well in thi sscenario.

Question 13

Q

Interpreting models in high dimensionality

Answer

A

In this sceanrio, you cannot be confident that you have selected the “best” model, only that you have a “good model”. Also traditional measures of model fit dont dont apply (Rsq, P value, etc.)

Question 14

Q

What gives you the most optimal subset of variables for a model (linear)

Answer

A

Best subset selection, available in the leaps library, regsubsets() function

Question 15

Q

When do you want to use best subset selection

Answer

A

Best subset selection is computationally expensive, however when p < 10, then usually is ok to do best subset

Linear Model Selection & Regularization Flashcards

(15 cards)