Week 6: Regression and Classification Flashcards
RevERSED
Advantage: have the potential to accurately fit a wider range of possible shapes for f
Disadvantage: a very large number of observations is required to obtain an accurate estimate for f
What is the advantage and disadvantage of non-parametric methods
RevERSED
predict(model)
How do you predict the outcome of the model in R?
RevERSED
Presence of a funnel shape in the residual plot. Transform Y to log(Y) or sqrt(Y)
How can you detect non-constant variance of error terms? What is a solution?
RevERSED
Use least squared approach to minimise RSS (residual sum of squares)
How do you find the coefficients of a simple linear regression model?
RevERSED
LOOCV has higher levels of variance than k-fold because the model outputs are highly correlated with each other and therefore the mean has higher variance
Why does k-fold CV give more accurate estimates of MSE than LOOCV?
RevERSED
One less than the number of levels, because there is a baseline level with no dummy variable
How many dummy variables will there be when there is a predictor with more than 2 levels?
RevERSED
- Forward selection: begin with null model. Then fit p simple linear regressions and add to the null model the variable that results in the lowest RSS. Continue adding variables until some stopping rule is satisfied
- Backward selection: start with all variables, remove the variable with the largest p-value, continue removing variables until a stopping rule is reached
- Mixed selection: combination of forward and backward. Start with no variables in the model. Add the variable that provides the best fit. Continue to add variables one by one. If at any point the p-value for one of the variables in the model rises above a certain threshold, then remove that variable from the model. Continue until all the variables in the model have a sufficiently low p-value and all variables outside the model would have a large p- value if added to the model
What are the three approaches for deciding which variables to include in a model? How do they work?
RevERSED
Do no make explicit assumptions about the functional form of f
What are non-parametric methods?
RevERSED
the percentage of Falses that are identified correctly = TN/(TN+FP)
What is specificity?
RevERSED
Randomly divide the set of observation into k groups (or folds) of approximately equal size. The first fold is treaded as a validation set and the method is fit on the remaining k-1 folds. Repeat k times and get k estimates of the MSE. Find the average
What is k-fold cross validation?
RevERSED
irreducible and reducible error
What does the accuracy of Y* as a prediction for Y depend on?
RevERSED
Automatically outputs the log odds. To change it:
predict(lr_mod, type=“response”)
What is the default when using predict with logistic? How do you change it?
RevERSED
predicted probability versus observed proportion, should be a straight line with slope 1
What is a calibration plot?
RevERSED
splits
What is a code for splitting data into test and train in R?
RevERSED
recursive partitioning . Find the split that makes observations as similar as possible on the outcome within that split. Do that again with each resulting group. Stop at stopping parameter
What are classification trees?
RevERSED
predict(model, newdata = data)
How do you predict the outcome of the model in R based on new data?
RevERSED
Tend to overfit
Use them as a basic building block for ensembles
What is the problem with regression trees? What can they be used for?
RevERSED
Compute the standard error of B0 and B1
How do you assess the accuracy of the coefficient estimates?
RevERSED
- Additivity assumption: that the association between a predictor X and the response Y does not depend on the values of the other predictors
- the error terms e1, e2, … are uncorrelated
- the error terms have a constant variance, Var(ei) = sigma squared
What are the assumptions of the linear model? (3)
RevERSED
Use dummy variables. 0 for one, 1 for other. Or -1 and 1
How do you put a categorical variable in a linear regression model?
RevERSED
as p increases (more dimensions), a given observation has no nearby neighbours
What is the curse of dimensionality?
RevERSED
bias initially decreases faster than variance increases, so the MSE declines. But at some point increasing flexibility has more impact on the variance, so the MSE increases.
What happens to the MSE as you increase flexibility?
RevERSED
Given a value for K and a prediction point x0, KNN regression first identifies the K training observations that are closest to x0 represented by N0. Then it estimates f(x0) using the average of all the training responses in N0
How does k nearest neighbours regression work?
RevERSED
if we include an interaction in a model we should also include the main effects, even if the p-values associated with their coefficients are not significant
What is the hierarchical principle?
RevERSED
Advantage: reduce the problem of estimating f down to one of estimating a set of parameters
Disadvantage: will usually not match the true unknown form of f
What is the advantage and disadvantage of parametric methods?
RevERSED
Y = B0 + B1X1 + B2X2 + B3X1X2 + e
Combination of predictors
What are interaction terms?
RevERSED
p(X) = [e^ B0+B1X] / [1 + e^ B0+B1X]
What is the logistic function?
RevERSED
the percentage of Trues that are identified correctly = TP/(TP+FN)
What is sensitivity/recall?
RevERSED
small training MSE but large test MSE
What happens to MSE when model is overfitted?
RevERSED
predicting class one if Pr (Y = 1 | X = x0) > 0.5
What does Bayes classifier correspond to in a two-response value setting?
RevERSED
lr_mod
How do you fit a logistic regression model using R?
RevERSED
increasing X by one unit changes the log odds by B1
How do you interpret B1 in a logistic regression model?
RevERSED
mean squared error
MSE = 1/n SUM(y - predicted y)^2
What is the most commonly used measure for measuring the quality of fit? what is the formula?
RevERSED
A model is perfectly calibrated if for any probability value p, a prediction of a class with confidence p is correct 100*p percent of the time
What is callibration?
RevERSED
Z-statistic
What is used for hypothesis testing in logistic regression?
RevERSED
pred_prob 0.5, labels = c(“No”, “yes”)
table(true=, predicted = pred_lr)
How do you produce table of observed vs predicted results when classified as probability?
RevERSED
e: cannot be predicted using X, therefore the error introduced by e cannot be reduced
What is irreducible error?
RevERSED
TP/(TP+FP)
What is the positive predictive value (PPV) / precision?
RevERSED
When performing regressions with a single predictor shows a very different outcome to performing regressions with multiple predictors that are also relevant
What is confounding?
RevERSED
ifelse(student==“Yes”, 1, 0)
How would you turn a vector of “yes” and “nos” into a vector of 1s and 0s
RevERSED
(TP+TN)/(TP+FP+FN+FN)
What is accuracy?
RevERSED
training MSE will decrease, but test MSE may not
What happens to MSE as model flexibility increases?
RevERSED
There is a dataset, set of people trying to find prediction rule and a referee.
The referee runs the prediction rule against a testing dataset, which is sequestered behind a Chinese wall
The referee objectively and automatically reports the score achieved by the submitted rule
Results in declining error rate
What is the common task framework /benchmarking?
RevERSED
B0 can be interpreted as the average Y among non-students. B0 + B1 as the average Y among students. B1 as the average difference in Y between students and non students
How do you interpret B0 and B1 when there is a dummy variable 1, when someone is a student, 0 when they are not a student
RevERSED
Assume that X= (X1, …Xp) is drawn from multivariate normal distribution with a class-specific mean vector and common covariance matrix
What are the assumptions for linear discriminant analysis when p>1?