Topics 5-9 Flashcards
What are popular selection methods? And when do you use them?
- Best subset selection
- Stepwise selection
When N is not much bigger than p, this results in high variance and poor test error
When can forward stepwise selection be used and backward stepwise cannot?
When the n<p, since backward stepwise starts with a k-1 parameter size model and modeling a p>n model is not possible
What are 2 approaches of determining the best predicition accuracy of one of the subsets?
1: Indirectly estimate test error by making adjustment to training error to account for bias due to overfitting (AIC, BIC, Cp, Adjusted R^2)
2: Directly estimating the test error using Cross validation or validation set approach
Explain what AIC, BIC, Cp and R^2 do, and how they are related and have special connections
AIC, BIC and Cp, smaller value indicates lower test error
higher adjusted R2 means lower test error
BIC places a higher penealty on # of predictors than AIC
R2 is not as well proven as the other 3
Cp is the same as AIC for linear regression
What is the advantage of finding the test error using CV over indirectly estimating it?
It makes fewer assumptions about the true underlying model. The main reason for the AIC,BIC, Cp and R2 was low computational power before, now CV is more attractive
What are the differences between Shrinkage methods and Subset Selection?
In subset selection you eleminate predcitors in your model entirely, in shrinkage you change the parameter coefficients towards 0 if they are not significant to the response variable.
What are the 2 popular shrinkage methods?
-Ridge Selection
-Lasso
Explain to yourself how Ridge Selection (regression) Works
It works by limiting the values the parameters can take, pushing some close to 0.
What is a downside of ridge regression?
That it will never push any parameter coefficient to 0, making results hard to interpret (inference)
When is either Ridge or Lasso regression preferable?
When a lot of the data is not usefull Lasso is better, when we know all the data is usefull Ridge is better
What are the main methods to improve OLS fitting?
Subset Selection
Shrinkage Methods
Dimension Reduction
Effect of the Tuning Parameter (λ) in Ridge and Lasso:
Ridge Regression:
As λ increases, coefficients shrink towards zero but are never exactly zero.
Controls multicollinearity and improves prediction accuracy for large p
Lasso Regression:
As λ increases, some coefficients shrink exactly to zero, promoting sparsity and variable selection.
What is an internal node, and what is a terminal node?
Internal Node: Represents a split in the data based on a predictor variable and a threshold. It partitions the predictor space into two regions.
Terminal Node (Leaf): Represents the end of a branch in the tree, where predictions are made. It contains the average response value of all observations that fall into that region.
What are the pros and cons of tree methods?
Pros:
Easy to explain and interpret.
Handles non-linear relationships well.
Works with qualitative predictors without creating dummy variables.
Cons:
Prone to overfitting.
High variance: small changes in data can lead to different trees.
Generally less accurate than other advanced methods.
What is the criterion used in each splitting (classification trees)?
Gini Index: Measures node purity. Smaller values indicate a purer node.
Cross-Entropy: Measures the uncertainty in node class probabilities.
Classification Error Rate: Proportion of misclassified observations (less commonly used for splitting as it’s less sensitive).