Chapter 5 Flashcards

Question 1

Q

Prediction

Answer

A

Average Error, MAPE (
Mean Absolute Percentage Error
), RMSE 
(Root-Mean-Square-Error)
, 
Validation Data

Question 2

Q

Classification

Answer

A

Classification matrix, specificity, sensitivity

Question 3

Q

ROC (Receiver Operating Characteristic)

Answer

A

to assess performance at different cutoff values

Question 4

Q

Detect overfitting

Answer

A

compare validation to training data:

some differences expected, extreme differences may indicate overfitting

Question 5

Q

Naïve rule

Answer

A

classify all records as belonging to the most prevalent class
benchmark:  we hope to do better than that
Using external predictor info should outperform naïve rule

Question 6

Q

Exception to Naïve rule

Answer

A

when goal is to identify high-value but rare outcomes, we may do well by doing worse than the naïve rule (see “lift” – later)

Question 7

Q

There are various performance measures comparing to the naïve rule

Answer

A

For example: multiple R squared, measures classifier fit to naïve rule.

The equivalent to using the naïve rule for classification is y^ (the sample mean)

Question 8

Q

Lift Chart for Predictive Error

Answer

A

Y axis is cumulative value of numeric target variable (e.g., revenue), instead of cumulative count of “responses”

X axis is cumulative number of cases, sorted left to right in order of predicted value

Benchmark is average numeric value per record, i.e. not using model (aka The Naïve Rule)

Question 9

Q

Misclassification error

Answer

A

Error = classifying a record as belonging to one class when it belongs to another class.

Error rate = percent of misclassified records out of the total records in the validation data

Question 10

Q

“High separation of records”

Answer

A

means that using predictor variables attains low error

Question 11

Q

“Low separation of records”

Answer

A

means that using predictor variables does not improve much on naïve rule

Question 12

Q

Confusion Matrix

Answer

A

actual class
predicted

Question 13

Q

Accuracy

Answer

A

1 – err

Question 14

Q

Cutoff Table

Answer

A

cut off is .50 , so everything above should be 1 and everything below should be 0. any records that are otherwise, it counted as misclassification

Question 15

Q

When One Class is More Important

Answer

A

we are willing to tolerate greater overall error, in return for better identifying the important class for further attention

Question 16

Q

Sensitivity

Answer

Study These Flashcards

A

The ability to detect important class members correctly

Question 17

Q

Specificity

Answer

Study These Flashcards

A

ability to rule out C0 members classified correctly

Question 18

Q

False positive

Answer

Study These Flashcards

A

% of predicted “C1’s” that were not “C1’s”
false alarm,
indicates a given condition exists, when it does not.

Question 19

Q

False negative

Answer

Study These Flashcards

A

% of predicted “C0’s” that were not “C0’s”

indicates a given condition not exists, but it really does

Question 20

Q

Lift and Decile Charts: Goal

Answer

Study These Flashcards

A

Useful for assessing performance in terms of identifying the most important class
The goal is to obtain a rank ordering among the records according to their estimated
probabilities of class membership
Compare performance of DM model to “no model, pick randomly”

Question 21

Q

Decile Chart

Answer

Study These Flashcards

A

In “most probable” (top) decile, model is twice as likely to identify the important class compared to avg. prevalence.

Question 22

Q

Lift vs. Decile Charts

Answer

Study These Flashcards

A

Decile chart does this in decile chunks of data
Y axis shows ratio of decile mean to overall mean

Lift chart shows continuous cumulative results
Y axis shows number of important class records identified

Chapter 5 Flashcards

(22 cards)