Chapter 8: Evaluation of Classifiers Flashcards
1
Q
The CRISP Data Mining Process
A
2
Q
Precision of Classifier Performance
A
- Holdout procedure - split data into two sets
- E.g. 66% training, 33% testing (aka holdout procedure)
- The more training instances, the better (after a certain size the performance might stagnate)
- The more test data, the better the estimation of the error rate
- Use k-fold cross validation
- Split into k sets
- Generate k models with k-1 sets (leave one set out)
- Test each model on kth set
- Use leave-one-out (jackknife)
- Generate n models on (n-1) instances
- Apply model to nth instance
3
Q
Holdout Procedures
A
- Holdout procedure:
- Reserve some data for testing (usually ~1/3) [ select random}
- Use remaining data for training
- Plentiful data - no problem!
- Common case: data set is limited
- Problems:
- Want both sets as large as possible
- Want both sets to be representative
4
Q
“Smart” Holdout
A
- Simple check: Are the proportions of classes about the same in each data set?
- Stratified holdout
- Guarantee that classes are (approximately) proportionally represented in the test and training set
- Repeated holdout (in addition) Do with different sets and multiple time to get independent Test data
- Randomly select holdout set several times and average the error rate estimates
5
Q
Cross Validation
A
6
Q
Holdout w/ Cross-Validation
A
- Fixed number of k partitions of the data (folds)
- In turn: each partition is used for testing and the remaining
- instances for training
- Finally each instance is used for testing once
- May use stratification
- Standard practice: stratified tenfold cross-validation Error rate is estimated by taking the average of error rates
7
Q
Leave-One-Out Holdout
A
- n-Fold Cross-Validation
- n instances are in the data set
- Use all but one instance for training
- Each iteration is evaluated by predicting the omitted instance
- Advantages / Disadvantages
- Maximum use of the data for training
- Deterministic (no random sampling of test sets)
- High computational cost
- Non-stratified sample!
8
Q
Comparing Mining Algorithms
A
- Suppose we have two algorithms
- Obtain two different models
- Estimate the error rates for the two models
- Compare estimates
- e(1) < e(2)?
- Select the better one
- Problem?
- Are there “significant” differences in the error rates?
9
Q
Comparing Random Estimates
A
- Estimated error rate is just an estimate (random)
- Student’s paired t-test tells us whether the means of two samples are significantly different
- Construct a t-test statistic
- Need variance as well as point estimates
10
Q
Counting the Costs of comparing classification errors
A
- In practice, different types of classification errors often incur different costs Examples:
- Predicting when customers leave a company for competitors (attrition/churn prognosis)
- Much more costly to lose a valuable customer
- Much less costly to act on a false customer
- Predicting when customers leave a company for competitors (attrition/churn prognosis)
- Loan decisions
- Oil-slick detection
- Fault diagnosis
- Promotional mailing
11
Q
Direct Marketing Paradigm
A
- Find most likely prospects to contact
- Not everybody needs to be contacted
- Number of targets is usually much smaller than number of prospects
- Typical applications (You are only looking for small part)
- retailers, catalogues, direct mail (and e-mail)
- customer acquisition, cross-sell, attrition prediction
- …
12
Q
Direct Marketing Evaluation
A
- Accuracy on the entire dataset is not the right measure (in decision tree leaf node provides propability)
- Approach
- develop a target model
- score all prospects and rank them by decreasing score
- select top P% of prospects for action
- How to decide what is the best selection?
13
Q
Generating a Gain Curve
A
- Instances are sorted according to their predicted probability:
- In a gain curve
- x axis is sample size
- y axis is number of positives
14
Q
Gain Curve: Random List vs. Model-ranked List
A
15
Q
Lift Curve
A