Linear Model Selection & Regularization Flashcards

1
Q

Feature Selection for Linear Regression

A
  1. Best subset selection, try combinations of all features (not feasible, just a theoretical optima). *Unless you have a really long time is ok
  2. Forward Stepwise Selection, start with no var, keep adding one var at a time that maximizes fit. Then select the best of these models using CV error.
  3. Backward Stepwise Selection - start with all vars, keep taking them away one by one. “” CV Error.
  4. Regularization Techniques (Ridge, Lasso)
  5. Feature Selection - Lasso even though is a regularization technique, does Feature selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ways of selecting a model when you have tried a bunch of them

A
  1. Use CV (Preferred)
  2. Look at metrics such as AIC, BIC, etc or Adjusted Rsquared, which are goodness of fit metrics that pay a price for unnecessary variables. *Used these when computing power was in issue, nowadays use CV.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Regularization Techniques for Linear Regression

A
  1. Ridge - supply a regularization parameter that penalizes large coefficients (so better have a good one). Must be careful in selecting the regularization parameter. Also must center/scale variables prior to regularization.
  2. Lasso - regularizes some coefficients to exactly zero so also performs feature selection. (preferred). Must also carefully select reg. parameter and scale/center vars.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Disadvantage of Ridge Regression

A

Will include all variables P in the model, albeit will shrink coefficients of variables. This creates problems for model interpretation. Alternatively, Lasso will set some coefficients exactly to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ridge vs. Lasso

A

Lasso is better when many of the vars really have no relation (zero coeff) with the outcome var.
Ridge will outperform if many of the vars infact are related to the outcome var.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to select tuning parameter

A

plot the K-fold CV Error against different values in the tuning parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Dimension Reduction vs. Regularization

A

Regularization involves controlling the coefficients (with lasso, even setting them to zero which has the effect of reducing dimensions). Dimension Reduction transform the predictors into a smaller set of variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dimension Reduction

A

Transform dimensions into a smaller subset of linear combinations that substantially explain the variability of the all the original variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Uses of PCA

A

Can be used for dimension reduction or unsupervised learning. Creates linear combinations of predictors that are uncorrelated w/eachother.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Things to consider when doing PCA

A

center/scale prior to using PCA

Must select the # of Prin Components by plotting CV Error vs. # of components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

PCA vs Partial Least Squares

A

PCA creates new vars that substantially explain the variability in the orignal vars.
PLS creates new vars that explain variability, but ALSO that are related to the response, kind of like a supervised PCA.
Like PCA, in PLS you must choose your # of features and center/scale. PLS IS NOT SUPERIOR TO PCA they are kind of a wash.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is high dimensionality

A

when p >= n. Linear and logistic regression doesn’t work well in thi sscenario.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interpreting models in high dimensionality

A

In this sceanrio, you cannot be confident that you have selected the “best” model, only that you have a “good model”. Also traditional measures of model fit dont dont apply (Rsq, P value, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What gives you the most optimal subset of variables for a model (linear)

A

Best subset selection, available in the leaps library, regsubsets() function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When do you want to use best subset selection

A

Best subset selection is computationally expensive, however when p < 10, then usually is ok to do best subset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly