Building models using training data is called Supervised Learning Interested in predicting the outcome variable for new records Three main types of outcomes: a. Predicted numerical value, e.g., house price b. Predicted class membership, e.g., cancer or not c. Probability of class membership (for categorical outcome variable), e.g., Naive Bayes

Graphical way to assess predictive performance In some applications, we are not interested in predicting the outcome of value of each new record But the goal is to search for a subset of records that gives the highest cumulative predicted values Compares the model's predictive performance to a baseline model that has no predictors

Type I Error → False Positive Predicted positive but that is incorrect Example: predicted that a man is pregnant, but actually he is not pregnant Type II Error → False Negative Predicted negative but that is incorrect Example: predicted a woman is not pregnant when actually she is pregnant

Lecture 8 - Performance measures Flashcards by Omar Elk

Classification

Classify e-mails to spam vs inbox

Given (labeled) examples of both document types
Train a classifier to discriminate between these two
During operation, use classifier to select destination folder for new email: Inbox or Spam folder?

How well did you know this?

Not at all

Perfectly

Steps in classification

Step 1: Split data into train and test sets
Step 2: Build a model on a training set
Step 3: Evaluate on test set

How well did you know this?

Not at all

Perfectly

Note on parameter tuning in classification steps

It is important that the test data is not used in any way to create the classifier
Some training schemes operate in two stages:
- Stage 1: build the basic structure
- Stage 2: optimise parameter setting
The test data cannot be used for parameter tuning
Proper procedure uses three sets: training data, validation data, and test data
- Validation data is used to optimise parameters

How well did you know this?

Not at all

Perfectly

Making the most of the data

Once evaluation is completed, all the data can be used to build the final classifier
Generally, the larger the training data the better the classifier (but returns diminish)
The larger the test data the more accurate the error estimate

How well did you know this?

Not at all

Perfectly

Types of outcomes

Building models using training data is called Supervised Learning
Interested in predicting the outcome variable for new records
Three main types of outcomes:
- a. Predicted numerical value, e.g., house price
- b. Predicted class membership, e.g., cancer or not
- c. Probability of class membership (for categorical outcome variable), e.g., Naive Bayes

How well did you know this?

Not at all

Perfectly

Evaluating Predictive Performance - Generating numeric predictions

Interested in models that have high predictive accuracy when applied to new records
Models are trained on the training data
Applied to the validation data and
Measures of accuracy then use the prediction errors on that validation set

How well did you know this?

Not at all

Perfectly

Prediction Accuracy measures

How well did you know this?

Not at all

Perfectly

Prediction Accuracy Measures pt2

How well did you know this?

Not at all

Perfectly

Prediction Accuracy Measures pt3

How well did you know this?

Not at all

Perfectly

Lift Chart pt1

Graphical way to assess predictive performance
In some applications, we are not interested in predicting the outcome of value of each new record
But the goal is to search for a subset of records that gives the highest cumulative predicted values
Compares the model’s predictive performance to a baseline model that has no predictors

How well did you know this?

Not at all

Perfectly

Lift Chart pt2

In practice, costs are rarely known
Decisions are made by comparing possible scenarios
Example: promotional bailout to 1,000,000 households
- Mail to all with response rate being 0.1% (1,000)
- Consider a data mining tool that can identify a subset of 100,000 most promising households with response rate 0.4% (400)
- Responses are 40% ut cost cost is 10%
- → It might pay off to restrict to these 100,00
The increase in response rate is called lift factor
A lift chart allows a visual comparison

How well did you know this?

Not at all

Perfectly

Judging Classifier Performance - i.e., categorical variables

Misclasification is when a record belongs to one class but the model classifies it as a member of a different class
A natural criterion for judging the performance of a classifier is the probability of making a misclassification error

Perfect classifier makes no errors! but the real world has “noise” therefore cannot construct perfect classifiers

How well did you know this?

Not at all

Perfectly

Confusion / Classification matrix

A matrix that summarises the correct and incorrect classifications that a classifier produced for a given dataset
Rows and columns correspond to the predicted and true (actual) classes
In practice, most accuracy measures are derived from this matrix
Correct classifications: True Positive and True Negative
Incorrect classifications: False positive - outcome incorrectly predicted as a yes / positive, False negative i.e., outcome incorrectly predicted as a no / negative

How well did you know this?

Not at all

Perfectly

Type I and II errors

Type I Error → False Positive

Predicted positive but that is incorrect
Example: predicted that a man is pregnant, but actually he is not pregnant

Type II Error → False Negative

Predicted negative but that is incorrect
Example: predicted a woman is not pregnant when actually she is pregnant

How well did you know this?

Not at all

Perfectly

Overall success rate / Accuracy

Number of correct classifications divided by the total number of classifications

How well did you know this?

Not at all

Perfectly

ROC curves

Study These Flashcards

Developed in 1950s for signal detection theory to analyse noisy signals
Characterise the trade-off between positive hits and false alarms
ROC curve plots TP (on the y-axis) against FP (on the x-axis)
Plots the true positive rate against the false positive rate for the different possible thresholds of a diagnostic test

Example ROC curve

Study These Flashcards

Using ROC for Model Comparison

Study These Flashcards

Limitation of Accuracy and Cost Matrix

Study These Flashcards

Consider a 2-class problem: number of class 0 examples = 9990 and number of class 1 examples = 10

If model predicts everything to be class 0, accuracy is 9990/10000 = 99.9%. Accuracy is misleading because model does not detect any class 1 example

Computing Cost of Classification

Study These Flashcards

Multiclass prediction

Study These Flashcards

Multiclass prediction - Kappa statistic

Study These Flashcards

Precision and Recall

Study These Flashcards

Precision and Recall formulas

Study These Flashcards

Determining Recall is difficult

• Total number of items / records that belong to a particular class is sometimes not available Solutions: * Sample across the dataset and perform relevance judgment on these items * Apply different models to the same dataset and then use the aggregate of relevant items as the total relevant set

F-measure (F1-measure)

Python code

Lecture 8 - Performance measures Flashcards

(27 cards)