Model Selection and Cross Evaluation Flashcards
What is model selection?
1) choice of algorithm, feature extraction, selection, and normalization.
2) hyperparameter tuning
What is model evaluation/assessment?
After selecting your model, estimating how it generalizes to new unseen data.
Holdout method of validation
Using separate test and training data.
Why do we need separate test data?
To get an unbiased estimate of how the model trained on training data works.
Variance in error-
How large the difference between estimated and actual error tends to be on average? Ideally close to zero.
Note: method with zero error can have a large variance! Errors in positive and negative directions cancel each other out.
Usually, the more data, the smaller the variance. Remember,
How to combine model selection and final evaluation
What is cross-validation?
Too little data to split,
you have a data set of n instances
When would you use CV?
When there’s not enough data to do a separate test set.
Leave-one-out cross-validation.
With n data, leaving one record out for testing and iterating through all data, predict the error with the ith record. On each iteration, overfitting cannot happen because the test data is not part of the training set. Allows using all the data for both training and testing. Does not always work perfectly.
How to combine model selection and final evaluation when using cross-validation?
Using cross-validation for the training set for parameter tuning. Test the final model on independent test data. Slight optimistic bias. To combat this: Nested cross-validation.
What performance measures are out there for regression and
classification?
Mean square error (or root mean squared error) for regression. 0 for a correct prediction, grows quadratically. Outliers have a large effect. Mean absolute error is another one, not as sensitive to outliers.
For Classification: misclassification rate (1 for error, 0 for correct value). Baseline: majority voter.
What limitations does misclassification rate / classification
accuracy have as a performance measure?
A low misclassification rate doesn’t necessarily mean good performance. A very unbalanced data will give a low misclassification rate.
What are cost and confusion matrices?
If costs for different misclassifications are different, use a cost matrix. (ie. calling a bad product good is worse than rejecting a good product).
Confusion matrix: Classification results are shown in a table where rows represent true classes and columns, predicted classes.
What are true positives, false positives etc., how can one
calibrate classifier to achieve different tradeoffs between them?
false positive - predicted as positive, really negative
false negative is often the most dangerous.
F-score 2(Precision x Recall/Precision + Recall)
What is a ROC curve, how to interpret area under ROC curve?
TPR = TP/(TP+FN) - ability to find positives (recall, sensitivity)
FPR = FP/FP+TN
rate of negatives IDd as positive
(1 specificity)
ROC curve, True positives against the False positive rate
best possible curve