ML Metrics Flashcards
Offline metrics for classification models
Precision, recall, F1 score, accuracy, ROC-AUC, PR-AUC,
confusion matrix
Offline metrics for regression
Mean squared error (MSE)
MAE
RMSE
Offline metrics for ranking system
- MRR
- mAP
- nDCG
Online metric for ad click prediction
- Click through rate
- Ad revenue
Online metric for harmful content detection
- Number of reports
- Actioned reports
Online metric for video recommendations
- Click through rate
- Total watch time
- Number of completed videos
Types of Loss Functions
Mean squared error
Categorical cross-entropy loss
Binary cross-entropy loss
Mean squared error
- Measures the difference between the predicted output and the true output
- Used to optimize the model parameters during training
- what we’re trying to minimize when we train a model
precision
positive predictive value
probability a sample classified as positive is actually positive
TP/(TP+FP)
recall
same as true positive rate
true positives / total positives
TP / (TP+FN)
sensitivity of the classification
What’s the best metric when you have a large number of negative samples
Precision and recall
Precision is not affected by a large number of negative samples because it measures the fraction of true positives out of the number of predicted positives (TP +FP).
Precision measures the probability of correct detection of positive values while FPR, TPR, and ROC measure the ability to distinguish between classes.
Highest value of F1
1.0 indicating perfect precision and recall
Lowest value of F1
0 if either precision or recall are 0
AUC range
0 to 1
ROC
- true positive rate (recall) on the y axis
false positive rate on the x axis - captures the performance of a classification model at all classification thresholds (probability thresholds)
- does not depend on class distribution!
- receiver operating characteristic curve
AUC
- area under the ROC curve
- used to evaluate a binary classification model
- Quantifies the ability of the model to correctly classify
AUC ranges from 0 to 1
AUC of 0
A model that is 100% wrong
AUC of 1
A model that is 100% correct
What’s the best metric when you have a large number of positive samples
ROC is a better metric
What metric should you use when detection of both classes is equally important
ROC
F1
- used to evaluate the performance of a binary classification model
- combines precision and recall into a single measure
- harmonic mean of the precision and recall which provides a balanced measure of the model’s accuracy
- F1 is 0 if either precision or recall is 0
true positive rate
aka recall
true positives / all positives
TP / (TP + FN)
Offline Metrics
Score the model when building it
Before model is put into production (train, eval, and test datasets)
Examples of offline metrics: ROC, AUC, F1, R^2, MSE, intersection over union
online metrics
scores from model once it is running in prod and serving
domain specific. things like click through rate or minutes spent watching a video.