Data Mining - Lecture Performance Measures Flashcards by Joost Kok

What do errors based on the training set tell us?

About the model’s fit.

How well did you know this?

Not at all

Perfectly

What do errors based on the validation set tell us?

The ability to predict new data.

These errors are called prediction errors.

How well did you know this?

Not at all

Perfectly

Which three types of outcome do we deal with in this course?

and can we also evaluate

A predicted numerical value
A predicted class membership
The probability of class membership

How well did you know this?

Not at all

Perfectly

How do we measure prediction accuracy for numerical prediction?

We use the error for each record and compute one of the following measures:

Mean absolute error (MAE)
Mean error
Mean percentage error (MPE)
Mean absolute percentage error (MAPE)
Root mean squared error
Lift Chart

How well did you know this?

Not at all

Perfectly

How do you compute the mean absolute error?

For each record you get the error. You disregard the sign before the value and you sum up all errors.

You multiply the sum with (1/n)

How well did you know this?

Not at all

Perfectly

How do you compute the mean error?

You sum up all errors (including the signs before the value). You multiply the sum of errors with (1/n).

How well did you know this?

Not at all

Perfectly

How do you compute the mean percentage error?

You divide each record’s error by the records actual value (Yi). You do this for all records and sum them up.

You multiply the sum with (1/N) and you do that * 100

How well did you know this?

Not at all

Perfectly

How do you compute the mean absolute percentage error?

Same as the mean percentage error, except in the first part you do not take the sign into considerations before the error value/Yi.

How well did you know this?

Not at all

Perfectly

How do you compute the Root mean squared error?

You square each individual error and sum them up. You multiply it by (1/n)/ The answer you put in a square root.

How well did you know this?

Not at all

Perfectly

What is a lift chart?

It is a chart that to compare a model with a baseline model with no predictions, to see which subset of records gives the highest cumulative predicted values.

On the x-axis you put percentage of samples.
On the y-axis you put the percentage where the model predicts the positive class well.

Then you can see how your model performs. It can be that for a small percentage of the sample you will get a more then average prediction/response rate.

How well did you know this?

Not at all

Perfectly

How can we evaluate classifier performance?

We make a confusion matrix.
Based on that confusion matrix we can compute accuracy, prediction, recall, F1
We can also make a ROC Curve based on the confusion matrix.

How well did you know this?

Not at all

Perfectly

What is a misclassification?

If your model puts records in the wrong class:

False Negative and False Positive

How well did you know this?

Not at all

Perfectly

What is a confusion matrix?

A matrix with the predicted classes on the x-axis and the actual classes on the y classes. It lists the TP, FP, TN, FN.
It reads like:

True Negative | False Positive
False Negative | True Positive

How well did you know this?

Not at all

Perfectly

What is a type I error?

False Positive

How well did you know this?

Not at all

Perfectly

What is a type II error?

False Negative

How well did you know this?

Not at all

Perfectly

How do you compute accuracy to evaluate classifier performance?

Study These Flashcards

True Positive + True Negative / n

What is a ROC Curve?

Study These Flashcards

A diagram in which you can put multiple models to evaluate their performance.

On the x-axis you have False Positive Rate
On the y-axis you have True Positive Rate

What happens if TP,FP in the ROC curve is (0.0)?

Study These Flashcards

The model declares everything as a negative class

What happens if the TP,FP in the ROC curve is (1,1)?

Study These Flashcards

It declares everything as a positive class

What happens if the TP,FP in the ROC curve is (1,0)?

Study These Flashcards

Ideal situation

What is the area under the curve in ROC in the random guessing situation and what is it in an ideal situation?

Study These Flashcards

Random: 0.5
Ideal: 1.0

So your model should definitely be in between those two.

What is a limitation of accuracy?

Study These Flashcards

If you have many records in one class and zero in the other class, the accuracy can still be high, even though it is not putting anything in one class.

How can you avoid this?

Refers to accuracy and it’s flaw

Study These Flashcards

If you get a cost matrix as well.

> You compute the accuracy
> You also compute total cost by (TP * cost TP) + (FP * cost FP) + (TN * cost TN) + (FN * cost FN)

What is the Kappa statistics in multiclass prediction?

Study These Flashcards

You can use it if you have an actual predictor confusion matrix and a random predictor confusion matrix.
-> It measures the improvement compared to the random predictor

(success rate actual predict - success rate random predictor) / (1 - succes rate random predictor)

What is Recall?

The ability of the model to find all of the items of the class. (True Positive) / (True Positive + False Negative)

What is Precision?

The ability of the model to correctly detect class items. | True Positive) / (True Positive + False Positives

What is the F-Measure?

Takes into account both Recall as Precision F = (2 / (1/R) + (1/P)

Data Mining - Lecture Performance Measures Flashcards

(27 cards)