Week 6: Regression and Classification Flashcards
How do you make a prediction vector using knn in R?
knn(train, test, cl, k)
train is the training set with only the predictor variables, test is the test set with only the predictor variables, cl is the outcomes from the training set, k is the desired k
What are interaction terms?
Y = B0 + B1X1 + B2X2 + B3X1X2 + e
Combination of predictors
What is overfitting?
following the errors or noise too closely
What is variance?
the amount by which f* would change if we estimated it using a different training data set. In general, more flexible methods have higher variance
What is a generative model for classification?
Model the distribution of the predictors X separately in each of the response classes. Then use Bayes theorem to flip these around into estimates for Pr(Y=k|X=x)
Why does k-fold CV give more accurate estimates of MSE than LOOCV?
LOOCV has higher levels of variance than k-fold because the model outputs are highly correlated with each other and therefore the mean has higher variance
What is the residual sum of squares (RSS)?
sum of all the residuals squared:
e1^2 + e2^2 …
What is confounding?
When performing regressions with a single predictor shows a very different outcome to performing regressions with multiple predictors that are also relevant
How do you find f with a parametric method?
First make an assumption about the functional form or shape of f, then use a procedure that uses the training data to fit or train the model
How do you predict the outcome of the model in R?
predict(model)
How do you interpret B0 and B1 when there is a dummy variable 1, when someone is a student, 0 when they are not a student
B0 can be interpreted as the average Y among non-students. B0 + B1 as the average Y among students. B1 as the average difference in Y between students and non students
What is the aim of linear discriminant analysis?
Find estimate for fk(x) to estimate pk(x) by approximating bayes classifier
What is sensitivity/recall?
the percentage of Trues that are identified correctly = TP/(TP+FN)
What is the difference between prediction and inference?
Prediction: predict Y using Y* = f*(X),
Inference: understanding the association between Y and X
What is the advantage and disadvantage of parametric methods?
Advantage: reduce the problem of estimating f down to one of estimating a set of parameters
Disadvantage: will usually not match the true unknown form of f
What do you need to check the significance of multiple coefficients together e.g. B0 = B1 = B2 = 0?
F-statistic, large F-statistic to reject null hypothesis
What is k-fold cross validation?
Randomly divide the set of observation into k groups (or folds) of approximately equal size. The first fold is treaded as a validation set and the method is fit on the remaining k-1 folds. Repeat k times and get k estimates of the MSE. Find the average
How do you quantify the test error on classification problems?
use the number of mis-classified observations rather than the MSE
How can you detect non-linearity of data with a linear model?
Plot the residuals ei versus the predictor xi
What is the error rate?
1 - Accuracy
What happens to MSE as model flexibility increases?
training MSE will decrease, but test MSE may not
How do you interpret the coefficients of multinomial logistic regression? with stroke, overdose and epilepsy as the 3 classifications
If epilepsy is set as the baseline, the B(stroke)0 is interpreted as the log odds of stroke versus epilepsy given that x1 = … = x1 = 0. A one unit increase in Xj is associated with a B(stroke)j increase in the log odds of stroke over epilepsy
What is the logistic function?
p(X) = [e^ B0+B1X] / [1 + e^ B0+B1X]
How do you produce table of observed vs predicted results when classified discretely?
table(true=, predicted=)