W07 Validation and Evaluation Flashcards
Forecast Evaluation
2 possibilities
- compare to actual observations (beware of self fulfilling prophecies)
- compare to naive
Error Measures
Absolute
Percentage
Scaled
Self-fulfilling forecast - negative consequence
buy-down
Classification Performance
Recall (Sensitivity)
correctly assigned / actually in class
Precision actually in class and assigned / assigned to class
Specificity how many (share of) not selected are actually not true?
Error rate across categories
average or weighted average or importance-weighted
Comparing error rates
training vs validation vs test set
expected error vs observed error vs benchmark approach
Possible Benchmarks
statistically expected error rate
naive rules
expert assignment
Benchmark Factors beyond accuracy
effort
reliability
acceptance
Data Set split
Training Set: build tree
Validation Set: prune tree
Test Set: evaluate tree’s predictions
Testing: Hold out 1
k-fold cross validation
1 split data into k partitions of equal size
2 use k-1 for training
3 use k for evaluation
4 repeat k times
5 average the results
Testing: Hold out 2
Bootstrap
alternative to cross validation, for small data sets
n is original data set size
draw n instances with replacement (same can be drawn multiple times)
This is the training set.
Never drawn instances are test set.
Lift Factor
What increase in accuracy does my prediction promise?
Gives ratio, not absolutes. Helpful for cost-benefit analysis.
Lift Chart
when classification is probabilistic
compute lift factor when increasing sample size, possibly comparing to increase in cost caused by increasing the sample size