Chapter 8 - Visualizing Model Performance Flashcards

1
Q

Why would you use of using rankings instead of classifications

A
  1. The model gives a score that ranks cases by their likelihood of beloning to a class of interest, but which is not a true probability.
  2. Not able to obtain accurate probability estimates from classifier.
  3. Costs and benefits cannot be specified precisely, but we still want to take actions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do w choose a proper threshold?

A
  1. Threshold is determined where the EV is above desired level (usually 0).
    Assumption: We have accurate probability estimates and well-speicfied cost-benefit matrix.

2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What effect does a threshold have on a Ranking Confusion Matrix

A

Whenever the threshold changes, the confusion matrix may change as well due to the number of True Positives and False Negatives change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Profit Curve?

A

A visualization of all the percentage of the list predicted as positive and the corresponding EV. This curve takes the ranking threshold into account, which shows more positives as the threshold lowers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Profit Curve?

A

A visualization of all the percentage of the list predicted as positive and the corresponding EV. This curve takes the ranking threshold into account, which shows more positives as the threshold lowers. Multiple Classifiers can be compared within this graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does a budgetary constraint affect your ranking strategy?

A

It can change the operating point and the choice of classifier.
Steps:
1. Calculate the number of budget per individual/instance.
2. Calculate the % of individuals you can target of the total customers.
(P. 213)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When do you use Profit Curves?

A

When you know the conditions under which a classifier will be used and the profit calculation conditions are expected to be stable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two critical conditions of using profit calculations?

A
  1. Class proirs (aka Base Rate): % of positive/negative instances in the target population.
  2. Costs and benefits: expected profit is sensitive to the relative c/b-levels.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When do you use the ROC graph?

A
When there is uncertainty in the profit calculations.
They are used in for: Classifications, class probability estimations, and scoring.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a ROC graph?

A

This is a two-dimensional plot of a classifier with False Positive rate on the x-axis and True Positives on the y-axis. It shows the trade-off between benefits (True Positives) and costs (False Positives).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a discrete classifier?

A

A classifier that outputs only one class label instead of a ranking. These classifier produce confusion matrices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do the points of classifiers on the ROC graph tell you?

A

Northwest = superior to the others
Lefthand side: Conservative: often low True Positives and False Positives
Right upperhand = Permissive: often high False Positive rates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an advantage of ROC graphs?

A

They decouple classifiers performance from the conditions under which the classifiers will be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the AUC?

A

Stand for Area Under the ROC Curve and values from zero to one. This can be used to summarize performance of a classifier into one number. A value of 0.5 corresponds to randomness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What can you use to summarize the predictiveness of a classifier?

A

The AUC (aka the Wilxocon measure).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When do you use a Cumulative Response Curve

A

When you want to use more intiuitive visualizations to show stakeholders.

17
Q

What are Cumulative Response Curves?

A

They are closely related to the ROC, but more intiuitive. They plot the True Positives (hit rate) on the y-axis against the percentage of the population that is targeted on the x-axis.

18
Q

What is a lift curve?

A

A curve which shows the superiority of a model. The numeric lift is plotted on the y-axis and the percent of the population targeted is plotted on the x-axis.

19
Q

What are downfalls of the lift curve and the Cumuluative Repsonse Curve?

A

If the exact proportion of positives in the population is unknown or not represented in the test data, this poses more risk.

They assume that the test set has exactly the same target class priors as the population to which it will be applied.

20
Q

What does a large std. dev tell you about the dataset?

A

That the results do not show a steady pattern. This could be due to a too small dataset of a model mismatch to a portion of the problem.