- Measures the difference between the predicted output and the true output - Used to optimize the model parameters during training - what we're trying to minimize when we train a model

- true positive rate (recall) on the y axis false positive rate on the x axis - captures the performance of a classification model at all classification thresholds (probability thresholds) - does not depend on class distribution! - receiver operating characteristic curve

ML Metrics Flashcards by Katie D

Offline metrics for classification models

Precision, recall, F1 score, accuracy, ROC-AUC, PR-AUC,
confusion matrix

How well did you know this?

Not at all

Perfectly

Offline metrics for regression

Mean squared error (MSE)
MAE
RMSE

How well did you know this?

Not at all

Perfectly

Offline metrics for ranking system

MRR
mAP
nDCG

How well did you know this?

Not at all

Perfectly

Online metric for ad click prediction

Click through rate
Ad revenue

How well did you know this?

Not at all

Perfectly

Online metric for harmful content detection

Number of reports
Actioned reports

How well did you know this?

Not at all

Perfectly

Online metric for video recommendations

Click through rate
Total watch time
Number of completed videos

How well did you know this?

Not at all

Perfectly

Types of Loss Functions

Mean squared error
Categorical cross-entropy loss
Binary cross-entropy loss

How well did you know this?

Not at all

Perfectly

Mean squared error

Measures the difference between the predicted output and the true output
Used to optimize the model parameters during training
what we’re trying to minimize when we train a model

How well did you know this?

Not at all

Perfectly

precision

positive predictive value

probability a sample classified as positive is actually positive

TP/(TP+FP)

How well did you know this?

Not at all

Perfectly

recall

same as true positive rate
true positives / total positives
TP / (TP+FN)

sensitivity of the classification

How well did you know this?

Not at all

Perfectly

What’s the best metric when you have a large number of negative samples

Precision and recall

Precision is not affected by a large number of negative samples because it measures the fraction of true positives out of the number of predicted positives (TP +FP).

Precision measures the probability of correct detection of positive values while FPR, TPR, and ROC measure the ability to distinguish between classes.

How well did you know this?

Not at all

Perfectly

Highest value of F1

1.0 indicating perfect precision and recall

How well did you know this?

Not at all

Perfectly

Lowest value of F1

0 if either precision or recall are 0

How well did you know this?

Not at all

Perfectly

AUC range

0 to 1

How well did you know this?

Not at all

Perfectly

ROC

true positive rate (recall) on the y axis
false positive rate on the x axis
captures the performance of a classification model at all classification thresholds (probability thresholds)
does not depend on class distribution!
receiver operating characteristic curve

How well did you know this?

Not at all

Perfectly

AUC

Study These Flashcards

area under the ROC curve
used to evaluate a binary classification model
Quantifies the ability of the model to correctly classify

AUC ranges from 0 to 1

AUC of 0

Study These Flashcards

A model that is 100% wrong

AUC of 1

Study These Flashcards

A model that is 100% correct

What’s the best metric when you have a large number of positive samples

Study These Flashcards

ROC is a better metric

What metric should you use when detection of both classes is equally important

Study These Flashcards

ROC

Study These Flashcards

used to evaluate the performance of a binary classification model
combines precision and recall into a single measure
harmonic mean of the precision and recall which provides a balanced measure of the model’s accuracy
F1 is 0 if either precision or recall is 0

true positive rate

Study These Flashcards

aka recall
true positives / all positives
TP / (TP + FN)

Offline Metrics

Study These Flashcards

Score the model when building it
Before model is put into production (train, eval, and test datasets)
Examples of offline metrics: ROC, AUC, F1, R^2, MSE, intersection over union

online metrics

Study These Flashcards

scores from model once it is running in prod and serving

domain specific. things like click through rate or minutes spent watching a video.

MRR

- mean reciprocal rank - only considers the rank of the *first* relevant item - not a good measure of the quality of the list as a whole

mAP

- mean average precision - good for ranking problems - works well for binary relevance (relevant or irrelevant). - For continuous relevance scores use nDCG

nDCG

- winner, winner for ranking problems - continuous relevance score - shows how good the ranking is compared to the ideal ranking - takes into account the position of the relevant item in a ranked list Ranges from 0 to 1. Higher values indicate better performance.

nDCG acronym

normalized discounted cumulative gain

Cross entropy

- how close the model's predicted probabilities are to the ground truth label. - CE is zero if we have an ideal system that predicts a 0 for the negative classes and 1 for the positive classes. - The lower the CE, the higher the accuracy of the prediction. - Good for ad click

Normalized cross entropy.

- Normalized cross entropy. - Ratio of our model's CE and the CE of the background CTR. - Low NCE indicates the model outperforms the baseline. - NCE ≥1 indicates that the model is not performing better than the baseline. - Good for ad click

ML Metrics Flashcards

(30 cards)