Midterm Practice Problems Flashcards
Describe what the “Bayes classifier” is
The Bayes classifier is a classification procedure that reaches the true minimum misclassification rate. It can be thought of as the underlying model that generates the true categories of the observations.
Will the Bayes Classifier result in 0 misclassifications?
Whilethere may be some cases where data is extremely well-separated and thus the Bayes classifier results in 0misclassifications, in general this is not expected
Under what assumptions is Linear Discriminant Analysis the Bayes classifier?
If each group (or subpopulation) is assumed to be mul-tivariate normally distributed, and all groups have a common covariance matrix
Under what assumptions is Quadratic Discriminant Analysis the Bayes classifier?
If each group is assumed to be multivariate normally distributed with uniquecovariance matrices
What is clustering?
Clustering is attempting to separate observations into groups according to the predictors (X) — there is noknown response (Y) that we are actively modelling, it is an exploratory procedure.
What is classification
Classification is the process of fitting a model using predictors (X) to predict a categorical response variable(Y)
Whats the difference between clustering and classification?
r
Example of Clustering
r
Example of Classification
f
What is a p-value
The p-value for a hypothesis test is the probability of observing a test statistic as extreme, or more extreme(in the direction of the alternative hypothesis), than that which we observed assuming the null hypothesisis true.
Suggest a way of finding the ‘best’ number of groups (k) for a data set usingk-means.
Runk-means for all reasonable number of groups we might wish to consider, and record the total within-group sum of squares. View those values graphically, and determine at whichkincreasing the numberof groups further shows little improvement.
what is Linear Discriminant Analysis
a method used to find a linear combination of features that characterizes or separates two or more classes of objects or events
what is Quadratic Discriminant Analysis
a method used to determine which variables discriminate between two or more naturally occurring groups, it may have a descriptive or a predictive objective.
What are LDA and QDA used for?
statistical learning methods used for classifying observations to a class or category
what is the response variable to LDA and QDA used for?
categorical