Midterm Practice Problems Flashcards

1
Q

Describe what the “Bayes classifier” is

A

The Bayes classifier is a classification procedure that reaches the true minimum misclassification rate. It can be thought of as the underlying model that generates the true categories of the observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Will the Bayes Classifier result in 0 misclassifications?

A

Whilethere may be some cases where data is extremely well-separated and thus the Bayes classifier results in 0misclassifications, in general this is not expected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Under what assumptions is Linear Discriminant Analysis the Bayes classifier?

A

If each group (or subpopulation) is assumed to be mul-tivariate normally distributed, and all groups have a common covariance matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Under what assumptions is Quadratic Discriminant Analysis the Bayes classifier?

A

If each group is assumed to be multivariate normally distributed with uniquecovariance matrices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is clustering?

A

Clustering is attempting to separate observations into groups according to the predictors (X) — there is noknown response (Y) that we are actively modelling, it is an exploratory procedure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is classification

A

Classification is the process of fitting a model using predictors (X) to predict a categorical response variable(Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Whats the difference between clustering and classification?

A

r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Example of Clustering

A

r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Example of Classification

A

f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a p-value

A

The p-value for a hypothesis test is the probability of observing a test statistic as extreme, or more extreme(in the direction of the alternative hypothesis), than that which we observed assuming the null hypothesisis true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Suggest a way of finding the ‘best’ number of groups (k) for a data set usingk-means.

A

Runk-means for all reasonable number of groups we might wish to consider, and record the total within-group sum of squares. View those values graphically, and determine at whichkincreasing the numberof groups further shows little improvement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is Linear Discriminant Analysis

A

a method used to find a linear combination of features that characterizes or separates two or more classes of objects or events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is Quadratic Discriminant Analysis

A

a method used to determine which variables discriminate between two or more naturally occurring groups, it may have a descriptive or a predictive objective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are LDA and QDA used for?

A

statistical learning methods used for classifying observations to a class or category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the response variable to LDA and QDA used for?

A

categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Suppose we knew that the Bayes’ classifier was a boundary that was extremely non-linear.If we were usingK-nearest neighbours as a classifier, would you expect a larger or smallervalue ofKto provide a better approximation of the boundary?

A

As k increases for k-nearest neighbours, we will see increasingly simple (linear looking) boundaries. Therefore , if we have a complicated (extremely non-linear) boundary, then we would expect a relatively small value of k to perform better.