ML Metrics Flashcards

1
Q

Offline metrics for classification models

A

Precision, recall, F1 score, accuracy, ROC-AUC, PR-AUC,
confusion matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Offline metrics for regression

A

Mean squared error (MSE)
MAE
RMSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Offline metrics for ranking system

A
  • MRR
  • mAP
  • nDCG
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Online metric for ad click prediction

A
  • Click through rate
  • Ad revenue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Online metric for harmful content detection

A
  • Number of reports
  • Actioned reports
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Online metric for video recommendations

A
  • Click through rate
  • Total watch time
  • Number of completed videos
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of Loss Functions

A

Mean squared error
Categorical cross-entropy loss
Binary cross-entropy loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mean squared error

A
  • Measures the difference between the predicted output and the true output
  • Used to optimize the model parameters during training
  • what we’re trying to minimize when we train a model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

precision

A

positive predictive value

probability a sample classified as positive is actually positive

TP/(TP+FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

recall

A

same as true positive rate
true positives / total positives
TP / (TP+FN)

sensitivity of the classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the best metric when you have a large number of negative samples

A

Precision and recall

Precision is not affected by a large number of negative samples because it measures the fraction of true positives out of the number of predicted positives (TP +FP).

Precision measures the probability of correct detection of positive values while FPR, TPR, and ROC measure the ability to distinguish between classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Highest value of F1

A

1.0 indicating perfect precision and recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Lowest value of F1

A

0 if either precision or recall are 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

AUC range

A

0 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

ROC

A
  • true positive rate (recall) on the y axis
    false positive rate on the x axis
  • captures the performance of a classification model at all classification thresholds (probability thresholds)
  • does not depend on class distribution!
  • receiver operating characteristic curve
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

AUC

A
  • area under the ROC curve
  • used to evaluate a binary classification model
  • Quantifies the ability of the model to correctly classify

AUC ranges from 0 to 1

17
Q

AUC of 0

A

A model that is 100% wrong

18
Q

AUC of 1

A

A model that is 100% correct

19
Q

What’s the best metric when you have a large number of positive samples

A

ROC is a better metric

20
Q

What metric should you use when detection of both classes is equally important

A

ROC

21
Q

F1

A
  • used to evaluate the performance of a binary classification model
  • combines precision and recall into a single measure
  • harmonic mean of the precision and recall which provides a balanced measure of the model’s accuracy
  • F1 is 0 if either precision or recall is 0
22
Q

true positive rate

A

aka recall
true positives / all positives
TP / (TP + FN)

23
Q

Offline Metrics

A

Score the model when building it
Before model is put into production (train, eval, and test datasets)
Examples of offline metrics: ROC, AUC, F1, R^2, MSE, intersection over union

24
Q

online metrics

A

scores from model once it is running in prod and serving

domain specific. things like click through rate or minutes spent watching a video.

25
Q

MRR

A
  • mean reciprocal rank
  • only considers the rank of the first relevant item
  • not a good measure of the quality of the list as a whole
26
Q

mAP

A
  • mean average precision
  • good for ranking problems
  • works well for binary relevance (relevant or irrelevant).
  • For continuous relevance scores use nDCG
27
Q

nDCG

A
  • winner, winner for ranking problems
  • continuous relevance score
  • shows how good the ranking is compared to the ideal ranking
  • takes into account the position of the relevant item in a ranked list
    Ranges from 0 to 1. Higher values indicate better performance.
28
Q

nDCG acronym

A

normalized discounted cumulative gain

29
Q

Cross entropy

A
  • how close the model’s predicted probabilities are to the ground truth label.
  • CE is zero if we have an ideal system that predicts a 0 for the negative classes and 1 for the positive classes.
  • The lower the CE, the higher the accuracy of the prediction.
  • Good for ad click
30
Q

Normalized cross entropy.

A
  • Normalized cross entropy.
  • Ratio of our model’s CE and the CE of the background CTR.
  • Low NCE indicates the model outperforms the baseline.
  • NCE ≥1 indicates that the model is not performing better than the baseline.
  • Good for ad click