Using a Holdout Set and LASSO Flashcards

Question 1

Q

Define a houldout set.

Answer

A

A selection of the original data that is not used for estimation.

Question 2

Q

Why do we use a holdout set?

Answer

A

To help reinforce external validity for live data instead of finding the best model for the original data.

Question 3

Q

What are the steps to evalutating a prediction using a holdout set?

Answer

A

Split the original data into a larger work set and a smaller holdout set
Further split the work set into training sets for k-fold cross-validation
Build models and select the best model using k-fold cross-validation
Re-estimate the best model using all observations in the work set
Take the estimated best model and apply it to the holdout set
Evaluate the prediction using the holdout set

Question 4

Q

T/F: The main reason for model selection problem is that we can try out every potentail combination.

Answer

A

False: We cannot try out every single combination, there would be too much!

Question 5

Q

What are the two methods to build models?

Answer

A

By hand - specifying variables and model
Using smart algorithms

Question 6

Q

What are the pros and cons of using LASSO?

Answer

A

Pros: no need to use ouside info for the model
Cons: may be sensitive to overfitting, hard to interpret

Question 7

Q

What is the LASSO method?

Answer

A

LASSO - least absolute shrinkage and selection operator
- A method to select variables to include in a linear regression to produce good predictions and avoid overfitting

Question 8

Q

What are the two accomplishments that LASSO does at the same time?

Answer

A

It selects a subset of the right-hand-side variables, dropping the other variables.
It shrinks coefficients for some variables that it keeps in the regression

Question 9

Q

Define the Tuning Parameter.

Answer

A

the weight for the penalty term vs OLS fit. This helps strengthen variable selection.

Note: A lambda of 0 means the regression is OLS.

Question 10

Q

What happens when we have an aggressive threshold? A lenient one?

Answer

A

Agressive: higher threshold and fewer variables are left in the regression
Lenient: lower threshold, and more variables left in the regression

Question 11

Q

T/F: LASSO modifies the way regression coefficients are estimated by adding a penalty term for too many coefficients

Question 12

Q

T/F: Big data leads to larger estimation error

Answer

A

False: it makes a smaller estimation error

Question 13

Q

T/F: Lasso creates biased estimates.

Answer

A

True. This is because it shrinks the coefficients, which creates slight bias in the estimates.

Question 14

Q

Why is it okay for LASSO to create bias in its estimates? Does this bias mean LASSO is inferior to OLS?

Answer

A

Remember that lambda searches for the smallest total loss, so it optimizes both bias and variance. Although OLS does not create biased estiments, its variance may increase the total loss. So, LASSO is not inferior to OLS because of this bias.

Question 15

Q

What steps of model building does LASSO substitute? What do we decide?

Answer

A

We decide the initial right hand variables.

LASSO decides the lambda and which variables to include in the final regression.

Using a Holdout Set and LASSO Flashcards

(15 cards)