Week 6: Regression and Classification Flashcards
How do you make a prediction vector using knn in R?
knn(train, test, cl, k)
train is the training set with only the predictor variables, test is the test set with only the predictor variables, cl is the outcomes from the training set, k is the desired k
What are interaction terms?
Y = B0 + B1X1 + B2X2 + B3X1X2 + e
Combination of predictors
What is overfitting?
following the errors or noise too closely
What is variance?
the amount by which f* would change if we estimated it using a different training data set. In general, more flexible methods have higher variance
What is a generative model for classification?
Model the distribution of the predictors X separately in each of the response classes. Then use Bayes theorem to flip these around into estimates for Pr(Y=k|X=x)
Why does k-fold CV give more accurate estimates of MSE than LOOCV?
LOOCV has higher levels of variance than k-fold because the model outputs are highly correlated with each other and therefore the mean has higher variance
What is the residual sum of squares (RSS)?
sum of all the residuals squared:
e1^2 + e2^2 …
What is confounding?
When performing regressions with a single predictor shows a very different outcome to performing regressions with multiple predictors that are also relevant
How do you find f with a parametric method?
First make an assumption about the functional form or shape of f, then use a procedure that uses the training data to fit or train the model
How do you predict the outcome of the model in R?
predict(model)
How do you interpret B0 and B1 when there is a dummy variable 1, when someone is a student, 0 when they are not a student
B0 can be interpreted as the average Y among non-students. B0 + B1 as the average Y among students. B1 as the average difference in Y between students and non students
What is the aim of linear discriminant analysis?
Find estimate for fk(x) to estimate pk(x) by approximating bayes classifier
What is sensitivity/recall?
the percentage of Trues that are identified correctly = TP/(TP+FN)
What is the difference between prediction and inference?
Prediction: predict Y using Y* = f*(X),
Inference: understanding the association between Y and X
What is the advantage and disadvantage of parametric methods?
Advantage: reduce the problem of estimating f down to one of estimating a set of parameters
Disadvantage: will usually not match the true unknown form of f
What do you need to check the significance of multiple coefficients together e.g. B0 = B1 = B2 = 0?
F-statistic, large F-statistic to reject null hypothesis
What is k-fold cross validation?
Randomly divide the set of observation into k groups (or folds) of approximately equal size. The first fold is treaded as a validation set and the method is fit on the remaining k-1 folds. Repeat k times and get k estimates of the MSE. Find the average
How do you quantify the test error on classification problems?
use the number of mis-classified observations rather than the MSE
How can you detect non-linearity of data with a linear model?
Plot the residuals ei versus the predictor xi
What is the error rate?
1 - Accuracy
What happens to MSE as model flexibility increases?
training MSE will decrease, but test MSE may not
How do you interpret the coefficients of multinomial logistic regression? with stroke, overdose and epilepsy as the 3 classifications
If epilepsy is set as the baseline, the B(stroke)0 is interpreted as the log odds of stroke versus epilepsy given that x1 = … = x1 = 0. A one unit increase in Xj is associated with a B(stroke)j increase in the log odds of stroke over epilepsy
What is the logistic function?
p(X) = [e^ B0+B1X] / [1 + e^ B0+B1X]
How do you produce table of observed vs predicted results when classified discretely?
table(true=, predicted=)
When selecting a level of smoothness for non-parametric methods, what is the trade-off?
Low-levels of smoothness can lead to overfitting
What is reducible error?
f* will not be a perfect estimate for f and will cause some error, this error is reducible because we can potentially improve the accuracy of f
What is the standard multiple linear regression formula?
Y = B0 + B1X1 + B2X2 + … + BpXp + e
What is the R squared statistic?
Measures the proportion of variability in Y that can be explained using X
R^2 = (TSS-RSS)/TSS
What is the positive predictive value (PPV) / precision?
TP/(TP+FP)
What is irreducible error?
e: cannot be predicted using X, therefore the error introduced by e cannot be reduced
How do you assess the accuracy of the coefficient estimates?
Compute the standard error of B0 and B1
What does fk(x) represent in linear discriminant analysis?
Pr(X|Y=k)
How can you assess collinearity between 2 variables and between multiple variables?
2 variables: correlation matrix
multiple variables: variance inflation factor (VIF). value exceeding 5 or 10 is problematic
How do you alter the sensitivity and specificity of a classifier?
Change the threshold at which an observation is assigned to a class - default is 0.5
How does the k-nearest-neighbours classifier work?
Given a positive integer K and a test observation x0, the KNN classifier first indentifies the K points in the training data that are closest to x0, represented by N0. It then estimates the conditional probability for class j as the fraction of points in N0 whose response values equal j. KNN classifies the test observation x0 to the class with the largest probability.
What is the problem with regression trees? What can they be used for?
Tend to overfit
Use them as a basic building block for ensembles
What is the ROC curve?
false positive rate vs true positive rate. The area under the curve gives the overall performance of a classifier
How do you test for significance of the coefficients?
Hypothesis test. Can compute t-statistic using standard errors. p-value is the probability of observing a value equal or larger than t. small p-value indicates unlikely to observe such a substantial association between the predictor and the response due to chance.
If p-value is small, reject hypothesis that co-efficient is 0
What type of model is polynomial regression?
It is still a linear model
What will happen if you increase the cutoff value?
less true positives and less false positives (ie less positives overall)
What does Bayes classifier do?
assigns an observation X=x to the class for which pk(x) is the largest (assigns each observation to the most likely class given predictor values) -produces the lowest possible test error rate, called the Bayes error rate
How do you assess the accuracy of the linear regression model?
residual standard error:
RSE = sqrt[(1 / n-2)*RSS]
It is the average amount that the response will deviate from the true regression line.
It is an absolute measure of the lack of fit of the model
What happens to the linear model if the error terms are correlated?
the estimated standard errors will be too low -> unwanted sense of confidence in model
How do you produce table of observed vs predicted results when classified as probability?
pred_prob 0.5, labels = c(“No”, “yes”)
table(true=, predicted = pred_lr)
What is the curse of dimensionality?
as p increases (more dimensions), a given observation has no nearby neighbours