Lecture 4 Flashcards
Methods that will help improve the performance of our model, these methods/approaches are divided into three main types
1- Subset Selection ( Select the attributes that we want to keep in the model
2- Second type is shrinkage, reduce the the predicotrs in such a way to reduce variance
3- Reducing the dimension of our data (some kind of transformation)
Subset selection
main goal reduce RSS
WE will always have the competing aspects between bias and variance
Validation set approach
Randomly dividing the set into training and test
LOOCV: Leave one out cut cross validation
Leaves single observation for validation and the remaining for the training set
LOOCV has less bias becaause:
1- It is fit with n-1 observations (training set) in comparison to validation set which has approximately half of the observations as training set
2- Validation set approach can yield different results due to randomness in splitting
3- LOOCV doesnt over estimate
4- However i may become computationally expensive as n increases
K fold validation:
instead of having one element for validation we use a fold for validation (multiple elements)
TRUE
LOOCV is special case for K fold where k=n
TRUE
Bootstrap
REpresent a way to resmaple your data over and over however with replacement
IN bootstrap it is like we are treating the sample as population and sampling over and over
TRUE
Comments in linear model
1- WE are assuming that the relationship between response and predictor is linear
Many of the variables used in a multiple regression model may not be associated with the response; this will add unnecessary complexity
True
Our target: We want minimum predicotrs to lead maximum explanations of the respone
TRUE
I want to do feature selection in order to have maximum interpretability of the model
TRUE, thus we do subset selection
Subset selection
Keep only subset of the variables in the model
Methods that allow us to improve the performance in our model:
There are 3 main types:
1- Subset Selection
2- Shrinkage
3- Dimension Reduction