Measuring Performance Flashcards
What is Classification Accuracy?
number of correctly classified samples
/
total samples
What is Classification Error Rate?
number of wrongly classified samples
/
total samples
What is Recall?
TP
/
TP + FN
What is Precision?
TP
/
TP + FP
What are some measures we can use to measure performance in regression?
- Root Mean Squared Error
- Mean Absolute Error
- Mean Absolute Percentage Error
- Coefficient of Determination
What is Root Mean Squared Error?
Take the average of the square differences, and then square root
What is Mean Absolute Error?
Take the average of the absolute differences
What is the Mean Absolute Percentage Error?
Take the average of ( the absolute differences divided by the true value)
When is Mean Absolute Percentage Error useful?
When different classes in our output might give drastically different range of values.
E.g one output might be temperature, and the other might be kilo calories. Therefore we should normalise the error for each before summing them up
What is the Bias Issue?
The idea that the accuracy of the training samples can be a poor estimator of the accuracy on unseen samples
What is the Variance Issue?
The idea that the accuracy on a new set of test samples can still vary from the true accuracy, depending on the makeup of the test samples
Smaller set of test samples can result in a higher variance
What is the Holdout Method?
Splitting the dataset into training and testing dataset
What is Random Sub-Sampling?
- Split the data in to K parts
- Randomly select a fixed number for training and testing
- Train the classifier from scratch using the training data, and then test to compute an error
- Repeat the experiment from steps 1 and 2, K times
- The final error is the average error of all the experiments
What is K-Fold Cross Validation?
- Divide the dataset into K partitions
- For each of the K experiments, use K-1 partitions for training, and estimate the error using the remaining partition
- The final error estimate is the average error of all K experiments
What is an advantage of using K-Fold Cross Validation?
All the examples in the dataset are eventually used for both training and testing
What is Leave One Out Cross Validation?
This is an extreme case of K-Fold Cross Validation.
Where if we have a dataset of N samples, we do N-Fold Cross Validation.
What is Bootstrap?
For K experiments, select the training samples using sampling with replacement
And use the data that was never selected as testing data
The final error is the average error of the K experiments
Comparing the errors of two models using Z-Tests
???
What is F-1 Score?
2 * P * R
/
P + R
What is the Variance Error?
This tells you how much the predictions changes under different realisations of the same model
What is the Bias Error?
This is a measure of how the close the average prediction is to the true value, under different realisations of the same model.
If I have high variance error, and low bias error, will I suffer from over-fitting or under-fitting?
Over-fitting
If I have low-variance error, and high bias error, will I suffer from over-fitting or under-fitting?
Under-fitting