Measuring Performance Flashcards
What is Classification Accuracy?
number of correctly classified samples
/
total samples
What is Classification Error Rate?
number of wrongly classified samples
/
total samples
What is Recall?
TP
/
TP + FN
What is Precision?
TP
/
TP + FP
What are some measures we can use to measure performance in regression?
- Root Mean Squared Error
- Mean Absolute Error
- Mean Absolute Percentage Error
- Coefficient of Determination
What is Root Mean Squared Error?
Take the average of the square differences, and then square root
What is Mean Absolute Error?
Take the average of the absolute differences
What is the Mean Absolute Percentage Error?
Take the average of ( the absolute differences divided by the true value)
When is Mean Absolute Percentage Error useful?
When different classes in our output might give drastically different range of values.
E.g one output might be temperature, and the other might be kilo calories. Therefore we should normalise the error for each before summing them up
What is the Bias Issue?
The idea that the accuracy of the training samples can be a poor estimator of the accuracy on unseen samples
What is the Variance Issue?
The idea that the accuracy on a new set of test samples can still vary from the true accuracy, depending on the makeup of the test samples
Smaller set of test samples can result in a higher variance
What is the Holdout Method?
Splitting the dataset into training and testing dataset
What is Random Sub-Sampling?
- Split the data in to K parts
- Randomly select a fixed number for training and testing
- Train the classifier from scratch using the training data, and then test to compute an error
- Repeat the experiment from steps 1 and 2, K times
- The final error is the average error of all the experiments
What is K-Fold Cross Validation?
- Divide the dataset into K partitions
- For each of the K experiments, use K-1 partitions for training, and estimate the error using the remaining partition
- The final error estimate is the average error of all K experiments
What is an advantage of using K-Fold Cross Validation?
All the examples in the dataset are eventually used for both training and testing