Chapter 4 - Classification Flashcards
LDA – Linear discriminant analysis
Linear decision boundary for classification problems. Use conditional probability and Bayes theorem. Look at probability distribution functions for the feature, condition on one of the classes. Assumption made is that the width (variance) is the same. The assumption causes the linear decision boundary.
QDA – Quadratic discriminant analysis
Quadratic decision boundary for classification problems. The assumption is that the variance may not be the same for the classes.
If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?
QDA will perform better on the training set because it has more flexibility in the model. There is a risk of overfitting with QDA, since the data is linear and the non-linearities QDA might have captured is likely caused by noise. Therefore, LDA is expected to perform better on the test set.
If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on the training set? On the test set?
QDA will likely perform better on the training set since the data is non-linear and the model is more flexible. Depending on the degree of non-linearity the difference between LDA and QDA will differ. The more non-linear it is, the less the difference. But chances are that QDA will perform better.
In general, as the sample size n increases, do we expect the test prediction accuracy of QDA relative to LDA to improve, decline, or be unchanged? Why?
The larger data set, the better the test prediction accuracy of QDA compared to LDA because more parameters require more data to train with. The risk of overfitting will decrease for QDA as the data set grows.