module 12 Flashcards

1
Q

example of overfitting

A

nonlinear model has been trained too well on the training dataset, the test dataset has worse performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Some Ways to overfit a linear regression model

A

Irrelevant explanatory variables
Collinear explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Some ways to underfit a linear regression model

A

Leaving out an important explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A parsimonious model

A

Aims to strike a balance between overfitting and underfitting the model to the training dataset
does this by: Having a low enough number of explanatory variables to avoid overfitting while
Having a high enough model fit to avoid underfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

adjusted R^2

A

used to measure model parsimoniousness
(n-1/n-p-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Interpreting the adjusted R^2

A

The higher the adjusted R^2 of a model, the more parismonious we say that the model is, and therefore, the less likely the model is to be overfit to the training dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

number of every possible model

A

2^p possible models
p is possible explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Heuristic techniques

A

Backwards Elimination Algorithm
Forward Selection Algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Regularized linear regression model

A

take the objective function from our basic linear regression model and add a type of penalty term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Penalty term

A

penalizes models that have too many explanatory variables that don’t bring enough predictive power to the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Goal of the penalty term

A

Goal 1: interpretation of what to leave out
Goal 2: leave out variables that lead to overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

LASSO Regression L1

A

(stands for least absolute shrinkage and selection operator)
The sum of absolute values for all coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Clearer slope interpretation with LASSO regression

A

If slope is set to be 0, LASSO regression model is suggesting that this slope’s corresponding variable can be left out of model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Ridge Regression (L2 Penalty Term)

A

the square root of the sum of the squared values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Less clear slope interpretation with ridge regression

A

The resulting slopes found w ridge regression provide much less of a clear indication as to which explanatory variables should be left out of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Benefits of ridge regression

A

In the presence of multicollinearity, ridge regression slopes can be more trusted than those that would have been returned by an nonregularized linear regression model
the predicted impact of these variables is more likely to get evenly distributed in the slopes

17
Q

Elastic net regression (L1 and L2 Penalty Term)

A

Combines the strengths of Lasso and Ridge by balancing between feature selection and coefficient shrinkage.

18
Q

The impact of the alpha parameter

A

Alpha is set high, the L1 term becomes higher, the results slopes will tend to look more like LASSO regression

Alpha is set low, the L2 term becomes higher, the results slopes will tend to look more like ridge regression and more focus on balancing the weight in collinear slopes

19
Q

Cross validation techniques

A

Techniques involve creating multiple sets of training and test datasets from the full dataset
Leave one out cross validation
K fold cross validation

20
Q

Leave one out cross validation

A

Has every observation appear in a test dataset exactly once
Every observation will be in a training dataset n-1 times

21
Q

Benefits of LOOCV

A

accurate test data performance
the corresponding test dataset predictions that we make will similarly reflect this higher accuracy that the full model might have achieved
No randomness
Low model variability

22
Q

Drawbacks of LOOCV

A

Computationally expensive
More variable test data predictions
Inflation of model performance

23
Q

K-fold cross-validation

A

Every observation appear in a test dataset once
Every observation will appear in a training dataset k-1 times

24
Q

Benefits of k-fold cross-validation

A

Less computationally complex than LOOCV
More accurate test data performance
Less inflation of model performance than LOOCV

25
Q

Drawbacks of k-fold cross-validation

A

More computationally complex than train-test-split method
Less accurate test data performance than LOOCV
Randomness

26
Q

AIC

A

Akaike information criterion (AIC)
- 2 * LLF + 2k
K = number of slopes in the model
LLF = the optimal log likelihood function value of the model

27
Q

AIC Interpretation

A

the lower the AIC score of a model is, the more parismonious the model is considered to be

28
Q

BIC

A

Bayes information criterion (BIC)
-2 * LLF + ln(n) * k

29
Q

BIC Interpretation

A

the lower the BIC score of a model is, the more parsimonious the model is considered to be

30
Q

AIC vs BIC

A

The only difference is the penalty term in the equation 2 for AIC, ln(n) for BIC

31
Q

Downsides of BIC

A

encourages smaller number of slopes, may come at the expense of training dataset fit, causing you to select a model that has a worse training data

32
Q

Downsides of AIC

A

AIC doesn’t penalize a high number of slopes as much as BIC score does. AIC score can be unhelpful for the purpose of model selection.