Lec3 - Evaluating Hypotheses Flashcards
What is the ultimate goal of machine learning?
The ultimate goal in machine learning is create models/ algorithms that can generalise to unknown data.
What is the correct approach to perform hyper-parameter tuning?
- Split dataset to train/val/test (50% - 25% - 25% for lots of data otherwise 60%-20%-20% are good)
- Try different values on training dataset using the accuracy of val. set as metric, and finally evaluate with test set.
What is the Holdout method?
The Holdout method is when we keep the classifier that leads to the maximum performance on the val. set. We are essentially selecting the set of hyper-parameters which yield the best classifier.
When would you use a train/val/test split and when cross-validation?
- When we have a lot of examples then the division into
training/validation/test datasets is sufficient.
-When we have a small sample size then a good alternative is cross validation.
Explain Cross Validation.
- Divide dataset to k (usually 10) folds, and use k-1 folds for training + validation and 1 fold for test.
- In each iteration calculate the error of the left-out test set.
- After all runs, calculate the average of the k errors.
We can introduce an additional validation set to tune hyper-parameters.
What is a Confusion Matrix?
A confusion matrix is a table that is often used to describe the performance of a classification model on a set of test data for which the true values are known.
Give the formulas and interpretations for: Classification Rate, Recall, Precision, F1 Score.
Classification Rate: No of correctly classified examples divided by all examples
(TP + TN) / (TP + TN + FP + FN)
Recall: No of correctly classified positive examples divided by total no of positive examples
TP / (TP + FN)
Precision: No of correctly classified positive examples divided by total number of predicted positive examples
TP / (TP + FP)
F1: A mix of Recall and Precision
2 * Precision * Recall / (Precision + Recall)
How do you Interpret:
- High Recall and Low Precision
- Low Recall and High Precision
- High Recall and Low Precision
Most of the positive examples are correctly recognised
(low FN) but there are a lot of false positives. - Low Recall and High Precision
We miss a lot of positive examples (high FN) but those
we predict as positive are indeed positive (low FP).
How to you calculate the values of the Confusion Matrix for multiple classes?
- We can define one class as positive and the others as negative.
- We can compute the performance measures in exactly the same way.
What is the impact of an imbalanced class distribution in the test set?
- Classification Rate goes down, is affected a lot by the majority class
- Precision (and F1) for Class 2 are significantly affected :
30% of class1 examples are misclassified -> leads to a higher number of FN than TN
due to imbalance
What are some solutions to class imbalance?
For confusion matrix: -Divide by the total number of examples per class to normalise.
General Solutions:
- Upsample Minority Class
- Downsample Majority Class
Repeat this procedure several times and train a classifier each time with a different training set. Report the mean and st. dev. of the selected performance measure
When can overfitting occur? How can we remedy this?
Overfitting can occur when:
- Learning is performed for too long (e.g., in Neural Networks).
- The examples in the training set are not representative of all possible
situations.
- The model we use is too complex.
How to fight overfitting:
- Stopping the training earlier (use the validation set to know when).
- Getting more data.
- Using the right level of complexity (again use the validation set).
What is a confidence interval (C.I.)? What is the formula to calculate a C.I.?
An N% confidence interval for some parameter p
is an interval that is expected with probability N% to contain p.
e.g. a 95% confidence interval [0.2,0.4] means that with probability 95% p lies between 0.2 and 0.4.
Formula:
https://drive.google.com/open?id=1P8Q38B6b_j84LSLGXpdLkuThCbqerdjJ
Exercise in Confidence Intervals:
https://drive.google.com/open?id=1owMIXDK5Ei4bqBEoWkmO4TFFrDGzvjMv
Solution:
https://drive.google.com/open?id=1Ag5ceI7N7MKCdHt-gVxBC3wlruQpNQdU
Name three statistical tests for comparing two algorithms.
- T-test
- Wilcoxon rank-sum
- Randomisation