Evaluation Flashcards
What is formal definition of overfitting?
A predictor F is overfit if we can find another predictor F’ where:
- Etrain(F’) > Etrain(F)
- Egen(F’) gen(F)
What is formal definition of underfitting?
Can find another predictor F’ with smaller Etrain and Egen
How is Etrain (training error computed)?

How is Egen calculated (generalization error)?

How can we estimate Egen?
Set aside test set and compute Etest (same way as Etrain)
lim Etest = Egen as the size of the test set -> infinity
How can you compute the confidence interval for Egen from Etest

What do we use training/validation/testing sets
for?
- Training set: construct classifier
- Validation set: pick algorithm + tune hyper parameters
- Testing set: estimate future error rate
How does cross-validation work?
- Randomly split data into k sets
- Test on one portion (train on k-1 others)
- Average error over all k folds
- Final classifier is trained on all date
What is leave-one-out?
Cross validation where k = # of training instances
What is the problem with leave-one-out validation?
Classes not balanced
Testing { 1 of A, 0 of B } vs training: { n/2 of B, n/(2-1) of A }
We would always predict B (most frequent), but we will always be wrong
What does stratification do?
Keeps class labels balanced across training/testing sets
How do you do stratification?
- Split instances by class
- Split class into K parts
- Assemple ith fold by combining 1 part from each path
What is true positive?
Classifier predicts positive, and it is positive
What is true negative?
Classifier predicts negative, and it is negative
What is false positive?
Classifier predicts positive, but they are negitive
What is false negative?
Classifier predicts negative, but it is actuall positive
What is the definition of classification error?

What is the defintion of accuracy?

What is the problem with classification error / accuracy?
Misleading when classes are unbalances
- Predict earthquake: unlikely so always predict no
- Decide if webpage is relevent: 99.999% are not so retreive nothing
What is the definition of False Alarm Rate?
FP / (FP + TN)
What is the definition of Miss rate?
FN / (TP + FN)
What is the definition of Recall?
TP / (TP + FN)
What is the defintion of Precision?
TP / (TP + FP)
What is the problem with False alarm rate / miss rate / recall / precision?
Trivial to get 100% or 0% individually, must report them in pairs
What evaluation measure would we use for event detection?
Cost = CFP * FP + CFN * FN
e.g. cost of evacuating with no earthquake vs cost of staying with earthquake
What is the definition of F-measure?
2 / (1 / Recall + 1 / Precision)
Simular to accuracy but without TN
What is a ROC curve?
Plot of TP vs FP as threshold varies
What does a perfect and random classifer look like on an ROC curve?

Whats are some problems with mean squared error?
- Very sensitive to outliers (because of the squaring)
- Sensitive to mean / scale (mean value might have lower MSE than a model which captures the pattern but the mean is off)
What is the mean absolute error (MAE)?

What is Median absolute deviation (MAD)?
med { |f(xi) - yi| }
Whats the pros/cons of median absolute deviation (MAD)?
completly ignores outliers but cant take derivative
What is definition of correlation coefficient?

What does correlation coeffient capture?
Realtive ordering - usefull for ranking tasks