Lecture 8 - Performance measures Flashcards
Classification
Classify e-mails to spam vs inbox
- Given (labeled) examples of both document types
- Train a classifier to discriminate between these two
- During operation, use classifier to select destination folder for new email: Inbox or Spam folder?
Steps in classification
Step 1: Split data into train and test sets
Step 2: Build a model on a training set
Step 3: Evaluate on test set
Note on parameter tuning in classification steps
- It is important that the test data is not used in any way to create the classifier
- Some training schemes operate in two stages:
- Stage 1: build the basic structure
- Stage 2: optimise parameter setting
- The test data cannot be used for parameter tuning
- Proper procedure uses three sets: training data, validation data, and test data
- Validation data is used to optimise parameters
Making the most of the data
- Once evaluation is completed, all the data can be used to build the final classifier
- Generally, the larger the training data the better the classifier (but returns diminish)
- The larger the test data the more accurate the error estimate
Types of outcomes
- Building models using training data is called Supervised Learning
- Interested in predicting the outcome variable for new records
- Three main types of outcomes:
- a. Predicted numerical value, e.g., house price
- b. Predicted class membership, e.g., cancer or not
- c. Probability of class membership (for categorical outcome variable), e.g., Naive Bayes
Evaluating Predictive Performance - Generating numeric predictions
- Interested in models that have high predictive accuracy when applied to new records
- Models are trained on the training data
- Applied to the validation data and
- Measures of accuracy then use the prediction errors on that validation set
Prediction Accuracy measures
Prediction Accuracy Measures pt2
Prediction Accuracy Measures pt3
Lift Chart pt1
- Graphical way to assess predictive performance
- In some applications, we are not interested in predicting the outcome of value of each new record
- But the goal is to search for a subset of records that gives the highest cumulative predicted values
- Compares the model’s predictive performance to a baseline model that has no predictors
Lift Chart pt2
- In practice, costs are rarely known
- Decisions are made by comparing possible scenarios
- Example: promotional bailout to 1,000,000 households
- Mail to all with response rate being 0.1% (1,000)
- Consider a data mining tool that can identify a subset of 100,000 most promising households with response rate 0.4% (400)
- Responses are 40% ut cost cost is 10%
- → It might pay off to restrict to these 100,00
- The increase in response rate is called lift factor
- A lift chart allows a visual comparison
Judging Classifier Performance - i.e., categorical variables
- Misclasification is when a record belongs to one class but the model classifies it as a member of a different class
- A natural criterion for judging the performance of a classifier is the probability of making a misclassification error
Perfect classifier makes no errors! but the real world has “noise” therefore cannot construct perfect classifiers
Confusion / Classification matrix
- A matrix that summarises the correct and incorrect classifications that a classifier produced for a given dataset
- Rows and columns correspond to the predicted and true (actual) classes
- In practice, most accuracy measures are derived from this matrix
- Correct classifications: True Positive and True Negative
- Incorrect classifications: False positive - outcome incorrectly predicted as a yes / positive, False negative i.e., outcome incorrectly predicted as a no / negative
Type I and II errors
Type I Error → False Positive
- Predicted positive but that is incorrect
- Example: predicted that a man is pregnant, but actually he is not pregnant
Type II Error → False Negative
- Predicted negative but that is incorrect
- Example: predicted a woman is not pregnant when actually she is pregnant
Overall success rate / Accuracy
Number of correct classifications divided by the total number of classifications