Probability, Prediction, and Classification Flashcards

1
Q

What is the difference in interpretating probability prediction and classification prediction?

A

Probability: predicting the probability of y = 1 for each observation
Classification: prediction whether yhat = 0 or yhat = 1 for each observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are two errors of the classification process?

A

False positives, false negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the confusion table?

A

It shows the number of observations by their predicted class and actual class. The quadrants are:

   TN        |          FN    | Total classified N
   FP        |        TP       | Total classified P Total true N | Total true P | All observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three measures of classification?

A

1) Accuracy = (TP + TN) / N : The proportion of rightly guessed observations
2) Sensitivity = TP / (TP + FN) : The proportion of true positives among all actual positives
3) Specificity = TN / (TN + FP) : The proportion of true negatives among all actual negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T/F: There is a trade-off between making false positive and false negative errors.

A

True, can be expressed with specificity and sensitivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the ROC curve show?

A

The proportion of false positives among all y = 0 observations (1 - specificity) and the proportion of true positives among all y = 1 observations (sensitivity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

T/F: The ROC curve of a completely random probability prediction is the 45 degree line.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F: The lower the area under the ROC curve, the better our predictions are.

A

False, want a higher number than random (0.5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two ways of finding the optimal classification threshold?

A

1) Use the formula loss(FP) / (loss(FP) + loss(FN))
2) Use a search algorithm that selects the probability model and the optimal classification threshold together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Fill in the blanks.

Having a higher threshold leads to _____ and vice versa

A

Fewer predicted exits, fewer FP, but more FN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define class imbalance

A

The event being studied is very rare or very frequent

Ex. Fraud or sport injury

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the consequences of having class imbalance?

A

Cross-validation can be less effective at avoiding overfititng.
The usual meaures of fit can be less good at differentiating models.
aka, poor model performance and model fitting and selection setup not ideal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do you do when you have a class imbalance?

A

1) Need to know when its happening
2) May need to rebalance the sample (downsampling or oversampling)
3) Or smart algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between downsampling and over-sampling?

A

Downsampling: randomly dropping observations from the frequent class
Over-sampling: getting more observations on rare event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When you consider higher and higher thresholds of predicted probabilities for classification, the number of false positives and false negatives changes. How and why?

A

Higher thresholds means it is harder to become “positive”, so this increases the number of FN in the data. Vice versa, lower thresholds means it is easier to become “positive”, so the number of FP increases. This is why finding the optimal threshold is important, we want to find the threshold that calcualtes the lease false reports.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly