Week 6: Regression and Classification Flashcards
RevERSED
Advantage: have the potential to accurately fit a wider range of possible shapes for f
Disadvantage: a very large number of observations is required to obtain an accurate estimate for f
What is the advantage and disadvantage of non-parametric methods
RevERSED
predict(model)
How do you predict the outcome of the model in R?
RevERSED
Presence of a funnel shape in the residual plot. Transform Y to log(Y) or sqrt(Y)
How can you detect non-constant variance of error terms? What is a solution?
RevERSED
Use least squared approach to minimise RSS (residual sum of squares)
How do you find the coefficients of a simple linear regression model?
RevERSED
LOOCV has higher levels of variance than k-fold because the model outputs are highly correlated with each other and therefore the mean has higher variance
Why does k-fold CV give more accurate estimates of MSE than LOOCV?
RevERSED
One less than the number of levels, because there is a baseline level with no dummy variable
How many dummy variables will there be when there is a predictor with more than 2 levels?
RevERSED
- Forward selection: begin with null model. Then fit p simple linear regressions and add to the null model the variable that results in the lowest RSS. Continue adding variables until some stopping rule is satisfied
- Backward selection: start with all variables, remove the variable with the largest p-value, continue removing variables until a stopping rule is reached
- Mixed selection: combination of forward and backward. Start with no variables in the model. Add the variable that provides the best fit. Continue to add variables one by one. If at any point the p-value for one of the variables in the model rises above a certain threshold, then remove that variable from the model. Continue until all the variables in the model have a sufficiently low p-value and all variables outside the model would have a large p- value if added to the model
What are the three approaches for deciding which variables to include in a model? How do they work?
RevERSED
Do no make explicit assumptions about the functional form of f
What are non-parametric methods?
RevERSED
the percentage of Falses that are identified correctly = TN/(TN+FP)
What is specificity?
RevERSED
Randomly divide the set of observation into k groups (or folds) of approximately equal size. The first fold is treaded as a validation set and the method is fit on the remaining k-1 folds. Repeat k times and get k estimates of the MSE. Find the average
What is k-fold cross validation?
RevERSED
irreducible and reducible error
What does the accuracy of Y* as a prediction for Y depend on?
RevERSED
Automatically outputs the log odds. To change it:
predict(lr_mod, type=“response”)
What is the default when using predict with logistic? How do you change it?
RevERSED
predicted probability versus observed proportion, should be a straight line with slope 1
What is a calibration plot?
RevERSED
splits
What is a code for splitting data into test and train in R?
RevERSED
recursive partitioning . Find the split that makes observations as similar as possible on the outcome within that split. Do that again with each resulting group. Stop at stopping parameter
What are classification trees?
RevERSED
predict(model, newdata = data)
How do you predict the outcome of the model in R based on new data?
RevERSED
Tend to overfit
Use them as a basic building block for ensembles
What is the problem with regression trees? What can they be used for?
RevERSED
Compute the standard error of B0 and B1
How do you assess the accuracy of the coefficient estimates?
RevERSED
- Additivity assumption: that the association between a predictor X and the response Y does not depend on the values of the other predictors
- the error terms e1, e2, … are uncorrelated
- the error terms have a constant variance, Var(ei) = sigma squared
What are the assumptions of the linear model? (3)
RevERSED
Use dummy variables. 0 for one, 1 for other. Or -1 and 1
How do you put a categorical variable in a linear regression model?
RevERSED
as p increases (more dimensions), a given observation has no nearby neighbours
What is the curse of dimensionality?
RevERSED
bias initially decreases faster than variance increases, so the MSE declines. But at some point increasing flexibility has more impact on the variance, so the MSE increases.
What happens to the MSE as you increase flexibility?
RevERSED
Given a value for K and a prediction point x0, KNN regression first identifies the K training observations that are closest to x0 represented by N0. Then it estimates f(x0) using the average of all the training responses in N0
How does k nearest neighbours regression work?
RevERSED
if we include an interaction in a model we should also include the main effects, even if the p-values associated with their coefficients are not significant
What is the hierarchical principle?