ML Metrics Flashcards

1
Q

Offline metrics for classification models

A

Precision, recall, F1 score, accuracy, ROC-AUC, PR-AUC,
confusion matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Offline metrics for regression

A

Mean squared error (MSE)
MAE
RMSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Offline metrics for ranking system

A
  • MRR
  • mAP
  • nDCG
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Online metric for ad click prediction

A
  • Click through rate
  • Ad revenue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Online metric for harmful content detection

A
  • Number of reports
  • Actioned reports
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Online metric for video recommendations

A
  • Click through rate
  • Total watch time
  • Number of completed videos
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of Loss Functions

A

Mean squared error
Categorical cross-entropy loss
Binary cross-entropy loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mean squared error

A
  • Measures the difference between the predicted output and the true output
  • Used to optimize the model parameters during training
  • what we’re trying to minimize when we train a model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

precision

A

positive predictive value

probability a sample classified as positive is actually positive

TP/(TP+FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

recall

A

same as true positive rate
true positives / total positives
TP / (TP+FN)

sensitivity of the classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the best metric when you have a large number of negative samples

A

Precision and recall

Precision is not affected by a large number of negative samples because it measures the fraction of true positives out of the number of predicted positives (TP +FP).

Precision measures the probability of correct detection of positive values while FPR, TPR, and ROC measure the ability to distinguish between classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Highest value of F1

A

1.0 indicating perfect precision and recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Lowest value of F1

A

0 if either precision or recall are 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

AUC range

A

0 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

ROC

A
  • true positive rate (recall) on the y axis
    false positive rate on the x axis
  • captures the performance of a classification model at all classification thresholds (probability thresholds)
  • does not depend on class distribution!
  • receiver operating characteristic curve
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

AUC

A
  • area under the ROC curve
  • used to evaluate a binary classification model
  • Quantifies the ability of the model to correctly classify

AUC ranges from 0 to 1

17
Q

AUC of 0

A

A model that is 100% wrong

18
Q

AUC of 1

A

A model that is 100% correct

19
Q

What’s the best metric when you have a large number of positive samples

A

ROC is a better metric

20
Q

What metric should you use when detection of both classes is equally important

21
Q

F1

A
  • used to evaluate the performance of a binary classification model
  • combines precision and recall into a single measure
  • harmonic mean of the precision and recall which provides a balanced measure of the model’s accuracy
  • F1 is 0 if either precision or recall is 0
22
Q

true positive rate

A

aka recall
true positives / all positives
TP / (TP + FN)

23
Q

Offline Metrics

A

Score the model when building it
Before model is put into production (train, eval, and test datasets)
Examples of offline metrics: ROC, AUC, F1, R^2, MSE, intersection over union

24
Q

online metrics

A

scores from model once it is running in prod and serving

domain specific. things like click through rate or minutes spent watching a video.

25
MRR
- mean reciprocal rank - only considers the rank of the *first* relevant item - not a good measure of the quality of the list as a whole
26
mAP
- mean average precision - good for ranking problems - works well for binary relevance (relevant or irrelevant). - For continuous relevance scores use nDCG
27
nDCG
- winner, winner for ranking problems - continuous relevance score - shows how good the ranking is compared to the ideal ranking - takes into account the position of the relevant item in a ranked list Ranges from 0 to 1. Higher values indicate better performance.
28
nDCG acronym
normalized discounted cumulative gain
29
Cross entropy
- how close the model's predicted probabilities are to the ground truth label. - CE is zero if we have an ideal system that predicts a 0 for the negative classes and 1 for the positive classes. - The lower the CE, the higher the accuracy of the prediction. - Good for ad click
30
Normalized cross entropy.
- Normalized cross entropy. - Ratio of our model's CE and the CE of the background CTR. - Low NCE indicates the model outperforms the baseline. - NCE ≥1 indicates that the model is not performing better than the baseline. - Good for ad click