Chapter 8 - Visualizing Model Performance Flashcards

Question 1

Q

Why would you use of using rankings instead of classifications

Answer

A

The model gives a score that ranks cases by their likelihood of beloning to a class of interest, but which is not a true probability.
Not able to obtain accurate probability estimates from classifier.
Costs and benefits cannot be specified precisely, but we still want to take actions.

Question 2

Q

How do w choose a proper threshold?

Answer

A

Threshold is determined where the EV is above desired level (usually 0).
Assumption: We have accurate probability estimates and well-speicfied cost-benefit matrix.

2.

Question 3

Q

What effect does a threshold have on a Ranking Confusion Matrix

Answer

A

Whenever the threshold changes, the confusion matrix may change as well due to the number of True Positives and False Negatives change.

Question 4

Q

What is a Profit Curve?

Answer

A

A visualization of all the percentage of the list predicted as positive and the corresponding EV. This curve takes the ranking threshold into account, which shows more positives as the threshold lowers.

Question 5

Q

What is a Profit Curve?

Answer

A

A visualization of all the percentage of the list predicted as positive and the corresponding EV. This curve takes the ranking threshold into account, which shows more positives as the threshold lowers. Multiple Classifiers can be compared within this graph.

Question 6

Q

How does a budgetary constraint affect your ranking strategy?

Answer

A

It can change the operating point and the choice of classifier.
Steps:
1. Calculate the number of budget per individual/instance.
2. Calculate the % of individuals you can target of the total customers.
(P. 213)

Question 7

Q

When do you use Profit Curves?

Answer

A

When you know the conditions under which a classifier will be used and the profit calculation conditions are expected to be stable.

Question 8

Q

What are the two critical conditions of using profit calculations?

Answer

A

Class proirs (aka Base Rate): % of positive/negative instances in the target population.
Costs and benefits: expected profit is sensitive to the relative c/b-levels.

Question 9

Q

When do you use the ROC graph?

Answer

A

When there is uncertainty in the profit calculations.
They are used in for: Classifications, class probability estimations, and scoring.

Question 10

Q

What is a ROC graph?

Answer

A

This is a two-dimensional plot of a classifier with False Positive rate on the x-axis and True Positives on the y-axis. It shows the trade-off between benefits (True Positives) and costs (False Positives).

Question 11

Q

What is a discrete classifier?

Answer

A

A classifier that outputs only one class label instead of a ranking. These classifier produce confusion matrices.

Question 12

Q

What do the points of classifiers on the ROC graph tell you?

Answer

A

Northwest = superior to the others
Lefthand side: Conservative: often low True Positives and False Positives
Right upperhand = Permissive: often high False Positive rates

Question 13

Q

What is an advantage of ROC graphs?

Answer

A

They decouple classifiers performance from the conditions under which the classifiers will be used.

Question 14

Q

What is the AUC?

Answer

A

Stand for Area Under the ROC Curve and values from zero to one. This can be used to summarize performance of a classifier into one number. A value of 0.5 corresponds to randomness

Question 15

Q

What can you use to summarize the predictiveness of a classifier?

Answer

A

The AUC (aka the Wilxocon measure).

Question 16

Q

When do you use a Cumulative Response Curve

Answer

Study These Flashcards

A

When you want to use more intiuitive visualizations to show stakeholders.

Question 17

Q

What are Cumulative Response Curves?

Answer

Study These Flashcards

A

They are closely related to the ROC, but more intiuitive. They plot the True Positives (hit rate) on the y-axis against the percentage of the population that is targeted on the x-axis.

Question 18

Q

What is a lift curve?

Answer

Study These Flashcards

A

A curve which shows the superiority of a model. The numeric lift is plotted on the y-axis and the percent of the population targeted is plotted on the x-axis.

Question 19

Q

What are downfalls of the lift curve and the Cumuluative Repsonse Curve?

Answer

Study These Flashcards

A

If the exact proportion of positives in the population is unknown or not represented in the test data, this poses more risk.

They assume that the test set has exactly the same target class priors as the population to which it will be applied.

Question 20

Q

What does a large std. dev tell you about the dataset?

Answer

Study These Flashcards

A

That the results do not show a steady pattern. This could be due to a too small dataset of a model mismatch to a portion of the problem.

Chapter 8 - Visualizing Model Performance Flashcards

(20 cards)