Model Selection and Evaluation Flashcards

Question 1

Q

What is the definition of Error Rate?

Answer

A

The proportion of incorrectly classified samples to the total number of samples. If a out of m samples are misclassified, the error rate is E = a/m. Accuracy is calculated with 1 - E.

Question 2

Q

What is the difference between Training/Empirical Error and Generalization Error?

Answer

A

The training error is the error calculated on the training set. The generalization error is the error calculated on new samples.

Question 3

Q

What is Overfitting?

Answer

A

When the learner learns the training examples too well, some peculiarities are taken as general properties that all potential samples will have. Almost all learning algorithms have implemented some mechanisms to deal with overfitting.

Question 4

Q

What is Underfitting?

Answer

A

The learner failed to learn the general properties of the training examples.

Question 5

Q

How do we select a model?

Answer

A

We use a testing set to estimate the learner’s ability to classify new samples and use the testing error as an approximation to the generalization error.

Question 6

Q

How does the Hold-Out method work?

Answer

A

This method splits a dataset D into two disjoint subsets. One as the training set S and the other as the training set T. It then trains a model on the training set S and then calculates the testing error on the testing set T as an estimation of the empirical error.

Question 7

Q

How does K-fold cross-validation work?

Answer

A

This method splits a dataset D into k disjoint subsets with similar sizes. In each trial of cross-validation, we use the union of k-1 subsets as the training set to train a model then use the remaining subset as the testing set to evaluate the model. We repeat this process k times and use each subset as the testing set precisely once. We average over k trial to obtain the evaluation result. Most common value for k is 10.

Question 8

Q

How can we decrease the error introduced by splitting in K-fold cross-validation?

Answer

A

We repeat the random splitting p times and average the evaluation results of p times of k-fold cross-validation.

Question 9

Q

What is the Leave-One-Out method?

Answer

A

I t is a special case of cross-validation. For a dataset D with m samples, it lets k=m. Each subset contains a single sample, and the training set is only one sample less than the original data D. Although it is considered accurate, LOO is prohibitive for larger datasets.

Question 10

Q

What is parameter tuning?

Answer

A

The process of finding the right parameters.

Question 11

Q

How do we split data for parameter selection?

Answer

A

Training set for training models, validation set for model selection and parameter tuning, testing set for estimating the generalization ability of the models.

Question 12

Q

What is the performance measure used for Regression Problems?

Answer

A

Mean Squared Error (MSE). (Do look at the equation!)

Question 13

Q

What are the four combinations of the ground-truth class and the predicted class in binary classification problems? How can they be displayed as?

Answer

A

True Positive, False Positive, True Negative, False Negative. TP + FP + TN + FN = total number of samples.
They can be displayed in a confusion matrix.

Question 14

Q

What is Precision? How is it calculated?

Answer

A

The proportion of correctly identified positive predictions out of all positive predictions made by the model. P = TP / (TP + FP’)

Question 15

Q

What is Recall? How is it calculated?

Answer

A

The proportion of correctly identified positive predictions out of all actual positive cases. R = TP/ (TP + FN’)

Question 16

Q

What is the difference between them? What is the trade-off between Precision and Recall?

Answer

A

They are contradictory. For precision, it is like asking: “Of all the things I said were positive, how many were actually correct?” Recall is asking: “Of all the actual positive cases, how many did I correctly find?” High precision, low recall: You’re cautious and only mark something as positive if you’re very sure, so you might miss some positives.
High recall, low precision: You’re casting a wide net, catching all positives but also incorrectly marking some negatives as positives.

Question 17

Q

What is the P-R curve and what is it used for?

Answer

A

A Precision-Recall (P-R) curve shows the trade-off between precision and recall at different thresholds.
It evaluates the performance of a model on imbalanced datasets, focusing on positive class predictions.

Question 18

Q

What are the steps to making and evaluating a P-R curve?

Answer

A

Put the samples that are most likely to be positive are at the top of the ranking list, and the samples that are least likely to be positive at the bottom.
Starting from the top of the ranking list, we can incrementally label the samples as positive to calculate the precision and recall at each increment.
Then plotting the precisions as the y-axis and the recalls as x-axis gives the P-R curve. The plots of the P-R curves are called P-R plots.
If the P-R curve of one learner entirely encloses the curve of another learner then the performance of the first learner is superior.
If the curves intersect, a reasonable solution is to compare the areas under the P-R curves, which to some extent, represent the proportion of cases when both precision and recall and relatively high.

Question 19

Q

What is Break-Even Point?

Answer

A

BEP is the point on a P-R curve where precision equals recall. It indicates a balance between the two metrics.

Question 20

Q

What is F1?

Answer

A

Another kind of measure, the general form of F1-measure is Fβ, which allows us to specify our preference over precision and recall, (CHECK EQUATION!) When β = 1, it reduces to the standard F1. When β > 1, recall is more important. When β < 1, the opposite.

Question 21

Q

What is a ROC Curve?

Answer

A

A Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (Recall) against the False Positive Rate (1 - Specificity) at various thresholds.

Question 22

Q

How do we make a ROC Curve?

Answer

A

We sort the samples by the predictions.
We then obtain two measures by gradually moving the cut point from the top toward the bottom of the ranked list.
Using those two measures as x-axis and y-axis gives the ROC curve.
Unlike P-R curves, the y-axis in a ROC curve is TPR and the x-axis is the FPR.
TPR (True Positive Rate) = TP / (TP + FN’) , FPR = FP / (TN + FP’)

Question 23

Q

How do we make a ROC Curve (More detailed)?

Answer

A

We start by sorting all the positive (m+) and negative (m-) samples based on the learner’s predictions.
Begin with the maximum threshold (predicting all samples as negative), so both TPR and FPR are 0. Mark this point as (0,0) on the ROC curve.
Gradually lower the threshold, classifying samples as positive in the sorted order. For each true positive, move the mark upward (y-axis) by 1/m+. For each false positive, move the mark right (x-axis) by 1/m-.
Connect these points from the ROC curve.

Question 24

Q

How can we compare learners based off of the ROC curve?

Answer

A

A learner is better, if its ROC curve entirely encloses the other’s ROC curve. If curves intersect, we compare using them the Area under the Curve. (AUC)

Question 25

Q

What is the equation of AUC?

Answer

A

Check notes.

Question 26

Q

What is ranking loss and how does it relate to AUC?

Answer

A

It penalizes incorrect ranking of positive and negative samples. AUC is related to ranking quality and is calculated as: AUC = 1 - ranking loss.