Data Mining - Chapter 5 (Performance Measures) Flashcards

Question 1

Q

Why do we need to evaluate our models?

Answer

A

Allows you to convince others that your work is meaningful
Without strong evaluation, your idea is likely to be rejected, or your code would not be deployed
Emperical evaluation helps guide meaningful research and development directions

Question 2

Q

What is a benefit of having a large training data set?

Answer

A

The larger the training data, the better the classifier

Question 3

Q

What is a benefit of having a large test data set?

Answer

A

The larger the test data, the more accurate the error estimate.

Question 4

Q

What do errors based on the training set tell us?

Answer

A

They give us information about the fit of the model

Question 5

Q

What do errors based on the validation/testing set tell us?

Answer

A

They measure the model’s ability to predict new data

Question 6

Q

What three types of outcomes exist in prediction through supervised learning

Answer

A

Predicted numerical value
Predicted class membership
Propensity - probability of class membership

Question 7

Q

What do we focus on when we are evaluating predictive performance (with numerical variables)?

Answer

A

We measure accuracy by using the prediction errors on the validation/test set.

All the measures are based on the prediction error. For a single record this is computed by subtractig the predicted outcome value from the actual outcome value:

ei = Yi - Yihat

Question 8

Q

Which five accuracy measures are there for models that predict numerical values?

Answer

A

Mean absolute error (MAE)
Mean error
Mean percentage error (MPE)
Mean absolute percentage error (MAPE)
Root mean squared error

–> Check slides for the formula’s.

Question 9

Q

What is the benefit of the mean percentage error(MPE) ?

Answer

A

It takes into account the direction of the error

Question 10

Q

What do you need to take into account when using any of these measures using the mean?

Answer

A

The measures are affected by outliers.

Question 11

Q

What is a Lift chart?

Answer

A

A graphical way to assess predictive performance. You use this when your goal is to search for a subset of records that gives the highest cumalative predicted values. (ranking)

-> The predictive performance is compared against a baseline model without predictors (average).

Question 12

Q

What is called the ‘lift’?

Answer

A

The ratio of model gains to naive benchmark gains.

Question 13

Q

What do we do when we are evaluating the performance of predicted class membership (classifiers)?

Answer

A

We are looking how well are model is doing, or comparing multiple models based on their accuracy in classifying records into classes.

We calculate the accuracy by subtracting the misclassification error from 1.

This is mainly done by using a confusion/classification matrix.

Question 14

Q

What is the confusion/classification matrix?

Answer

A

It is a matrix in which the predicted classes are compared to the actual classes. The actual classes are portrayed on the y-axis and the predicted classes on the x-axis.

The matrix will contain numbers for:

True positive
True negative
False positive
False negative

Question 15

Q

What is a Type I error?

Answer

A

A false positive.

Question 16

Q

What is a Type II error?

Answer

A

A false negative.

Question 17

Q

How do you compute the accuracy for classifiers?

Answer

A

There are two ways:

(TP + TN) / N

or

1 - ((FP + FN / N))

Question 18

Q

On which dataset do we base our confusion/classification matrix?

Answer

A

On the testing (book:validation) data set.

-> Otherwise we do not get an honest estimate of the misclassificaiton rate for new data.

Question 19

Q

What is the ROC-curve?

Answer

A

It is a graph that plots the true positive on the y-axis and the false positive on the x-axis.

> Based on the graph it forms you can compare it to the random function (straight linear line from (0,0) to (1,1) and see where the model performs better than the random function.
> If you put two models in you can compare the two models and see which one performs better in which scenario.

Question 20

Q

What are the three extreme situations in the ROC-curve?

Answer

A

(TP, FP)

(0,0) - All records are classified as either true negative or false negative.

(1,1) - All records are classified as either true positive or false positive.

(1,0) - Ideal situation.

Question 21

Q

What is the limitation of accuracy?

Answer

A

If you have a dataset with two classes and class I has 9990 records and class II has 10 records.

If the model predicts everything in class I, it still has an accuracy of 99.9%, but it does not classify anything in class II.

Question 22

Q

What is the cost matrix and how can you compute the cost of a model based on that?

Answer

A

A matrix containing per cell (of TP, FP, TN, FN) what the cost is allocated to it.

-> (cost x number in each cell) + (cost x number in each cell) and so on for all cells.

Question 23

Q

Outside of accuracy, which three measures can you compute for a model that focusses on classification?

Answer

A

1. Precision
The ability of the model to correctly detect class items

Recall
The ability of the model to find all of the items of the class
F-measure (F1-measure)
Taking precision and recall into account

Question 24

Q

What is the formula of precision?

Answer

A

(True positives) / (True positives + false positives)

Question 25

Q

What is the formula of recall?

Answer

A

True positives / (true positives + false positives)

Question 26

Q

What is the formula of F-measure?

Answer

A

2 / ((1/R)+(1/P))

Question 27

Q

Why is determining recall sometimes difficult?

Answer

A

The total number of items/records that belong to a particular class is sometimes not available.

Question 28

Q

How can we use those measures in Python?

Answer

A

You need to define Y_pred = model.predict(X_test)

You can do the confusion matrix or those measures with Y_test as first variable and Y_pred as second variable.