Week 6: Regression and Classification Flashcards
Advantage: have the potential to accurately fit a wider range of possible shapes for f
Disadvantage: a very large number of observations is required to obtain an accurate estimate for f
What is the advantage and disadvantage of non-parametric methods
How do you predict the outcome of the model in R?
Presence of a funnel shape in the residual plot. Transform Y to log(Y) or sqrt(Y)
How can you detect non-constant variance of error terms? What is a solution?
Use least squared approach to minimise RSS (residual sum of squares)
How do you find the coefficients of a simple linear regression model?
LOOCV has higher levels of variance than k-fold because the model outputs are highly correlated with each other and therefore the mean has higher variance
Why does k-fold CV give more accurate estimates of MSE than LOOCV?
One less than the number of levels, because there is a baseline level with no dummy variable
How many dummy variables will there be when there is a predictor with more than 2 levels?
- Forward selection: begin with null model. Then fit p simple linear regressions and add to the null model the variable that results in the lowest RSS. Continue adding variables until some stopping rule is satisfied
- Backward selection: start with all variables, remove the variable with the largest p-value, continue removing variables until a stopping rule is reached
- Mixed selection: combination of forward and backward. Start with no variables in the model. Add the variable that provides the best fit. Continue to add variables one by one. If at any point the p-value for one of the variables in the model rises above a certain threshold, then remove that variable from the model. Continue until all the variables in the model have a sufficiently low p-value and all variables outside the model would have a large p- value if added to the model
What are the three approaches for deciding which variables to include in a model? How do they work?
Do no make explicit assumptions about the functional form of f
What are non-parametric methods?
the percentage of Falses that are identified correctly = TN/(TN+FP)
What is specificity?
Randomly divide the set of observation into k groups (or folds) of approximately equal size. The first fold is treaded as a validation set and the method is fit on the remaining k-1 folds. Repeat k times and get k estimates of the MSE. Find the average
What is k-fold cross validation?
irreducible and reducible error
What does the accuracy of Y* as a prediction for Y depend on?
Automatically outputs the log odds. To change it:
predict(lr_mod, type=“response”)
What is the default when using predict with logistic? How do you change it?
predicted probability versus observed proportion, should be a straight line with slope 1
What is a calibration plot?
What is a code for splitting data into test and train in R?
recursive partitioning . Find the split that makes observations as similar as possible on the outcome within that split. Do that again with each resulting group. Stop at stopping parameter
What are classification trees?
predict(model, newdata = data)
How do you predict the outcome of the model in R based on new data?
Tend to overfit
Use them as a basic building block for ensembles
What is the problem with regression trees? What can they be used for?
Compute the standard error of B0 and B1
How do you assess the accuracy of the coefficient estimates?
- Additivity assumption: that the association between a predictor X and the response Y does not depend on the values of the other predictors
- the error terms e1, e2, … are uncorrelated
- the error terms have a constant variance, Var(ei) = sigma squared
What are the assumptions of the linear model? (3)
Use dummy variables. 0 for one, 1 for other. Or -1 and 1
How do you put a categorical variable in a linear regression model?
as p increases (more dimensions), a given observation has no nearby neighbours
What is the curse of dimensionality?
bias initially decreases faster than variance increases, so the MSE declines. But at some point increasing flexibility has more impact on the variance, so the MSE increases.
What happens to the MSE as you increase flexibility?
Given a value for K and a prediction point x0, KNN regression first identifies the K training observations that are closest to x0 represented by N0. Then it estimates f(x0) using the average of all the training responses in N0
How does k nearest neighbours regression work?
if we include an interaction in a model we should also include the main effects, even if the p-values associated with their coefficients are not significant
What is the hierarchical principle?
Advantage: reduce the problem of estimating f down to one of estimating a set of parameters
Disadvantage: will usually not match the true unknown form of f
What is the advantage and disadvantage of parametric methods?
Y = B0 + B1X1 + B2X2 + B3X1X2 + e
Combination of predictors
What are interaction terms?
p(X) = [e^ B0+B1X] / [1 + e^ B0+B1X]
What is the logistic function?
the percentage of Trues that are identified correctly = TP/(TP+FN)
What is sensitivity/recall?
small training MSE but large test MSE
What happens to MSE when model is overfitted?
predicting class one if Pr (Y = 1 | X = x0) > 0.5
What does Bayes classifier correspond to in a two-response value setting?
How do you fit a logistic regression model using R?
increasing X by one unit changes the log odds by B1
How do you interpret B1 in a logistic regression model?
mean squared error
MSE = 1/n SUM(y - predicted y)^2
What is the most commonly used measure for measuring the quality of fit? what is the formula?
A model is perfectly calibrated if for any probability value p, a prediction of a class with confidence p is correct 100*p percent of the time
What is callibration?
What is used for hypothesis testing in logistic regression?
pred_prob 0.5, labels = c(“No”, “yes”)
table(true=, predicted = pred_lr)
How do you produce table of observed vs predicted results when classified as probability?
e: cannot be predicted using X, therefore the error introduced by e cannot be reduced
What is irreducible error?
What is the positive predictive value (PPV) / precision?
When performing regressions with a single predictor shows a very different outcome to performing regressions with multiple predictors that are also relevant
What is confounding?
ifelse(student==“Yes”, 1, 0)
How would you turn a vector of “yes” and “nos” into a vector of 1s and 0s
What is accuracy?
training MSE will decrease, but test MSE may not
What happens to MSE as model flexibility increases?
There is a dataset, set of people trying to find prediction rule and a referee.
The referee runs the prediction rule against a testing dataset, which is sequestered behind a Chinese wall
The referee objectively and automatically reports the score achieved by the submitted rule
Results in declining error rate
What is the common task framework /benchmarking?
B0 can be interpreted as the average Y among non-students. B0 + B1 as the average Y among students. B1 as the average difference in Y between students and non students
How do you interpret B0 and B1 when there is a dummy variable 1, when someone is a student, 0 when they are not a student
Assume that X= (X1, …Xp) is drawn from multivariate normal distribution with a class-specific mean vector and common covariance matrix
What are the assumptions for linear discriminant analysis when p>1?