Evaluating Model Performance Flashcards
Describe Accuracy
accuracy = number of correctly identified instances in the class / all instances in that class
Describe a Confusion Matrix
A matrix with the predicted class counts (correct / in correct) on the row elements and the actual class count (correct / incorrect) on the columns
What is Recall?
TP / (TP + FN) - Recall in this context is also referred to as the true positive rate or sensitivity
What is Precision?
TP / (TP + FP) - precision is also referred to as positive predictive value (PPV);
What is the F1 score?
F1 = 2 * (precision * recall) / (precision + recall). The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst at 0:
What is the Mean Absolute Error?
The mean absolute error takes the total absolute error of each example and averages the error based on the number of data points.
What is the Mean Squared Error?
the residual error - or that is the difference between predicted and the true value, are squared.
Some benefits of squaring the residual error is that it automatically converts all the errors as positives, emphasizes larger errors rather than smaller errors, and from calculus is differentiable which allows us to find the minimum or maximum values.
In model prediction what are the two main sources of errors that a model can suffer from?
Bias due to a model being unable to represent the complexity of the underlying data or variance due to a model that is overly sensitive to the limited data it has been trained on.
When does bias occur?
bias occurs when a model has enough data but is not complex enough to capture the underlying relationships. As a result, the model consistently and systematically misrepresents the data, leading to low accuracy in prediction. This is known as underfitting. It is an overly simplified model. High error on training set. In regression that would mean low r^2, large SSE
When does variance occur?
When we train a model, we typically use a limited number of samples from a larger population (the training set). If we repeatedly train a model with randomly selected subsets of data, we would expect its predictons to be different based on the specific examples given to it. Here variance is a measure of how much the predictions vary for any given test sample.
Some variance is normal, but too much variance indicates that the model is unable to generalize its predictions to the larger population from which training samples were drawn. High sensitivity to the training set is also known as overfitting, and generally occurs when either the model is too complex and/or we do not have enough data to support it.
We can typically reduce the variability of a model’s predictions and increase precision by training on more data. Would mean much larger error on test set than on training set.
What can happen more likely when you use too few features?
High bias. Might need several features to fully describe what’s going on in the data but might only be using a subset of the necessary features. Thus, overly simplified model and high bias. Think of it as highly biases to too few features.
What are patterns to identify high variance?
Using many features or carefully optimizing performance to a training set.
What is K-Fold Cross Validation?
Split data into k bin sizes, Run K separate training / test runs picking each bin once and training on the other k-1 bins. Then average test set performances from the k experiments.
What is the curse of dimensionality?
As the number of features or dimensions grow, the amount of data we need to generalize accurately grows exponentially
What is a learning curve in machine learning
A visual graph that compares the metric performance of a model on training and testing data over a number of training instances.