Model Selection and Cross Evaluation Flashcards

Question 1

Q

What is model selection?

Answer

A

1) choice of algorithm, feature extraction, selection, and normalization.
2) hyperparameter tuning

Question 2

Q

What is model evaluation/assessment?

Answer

A

After selecting your model, estimating how it generalizes to new unseen data.

Question 3

Q

Holdout method of validation

Answer

A

Using separate test and training data.

Question 4

Q

Why do we need separate test data?

Answer

A

To get an unbiased estimate of how the model trained on training data works.

Question 5

Q

Variance in error-

Answer

A

How large the difference between estimated and actual error tends to be on average? Ideally close to zero.
Note: method with zero error can have a large variance! Errors in positive and negative directions cancel each other out.
Usually, the more data, the smaller the variance. Remember,

Question 6

Q

How to combine model selection and final evaluation

Question 7

Q

What is cross-validation?

Answer

A

Too little data to split,
you have a data set of n instances

Question 8

Q

When would you use CV?

Answer

A

When there’s not enough data to do a separate test set.

Question 9

Q

Leave-one-out cross-validation.

Answer

A

With n data, leaving one record out for testing and iterating through all data, predict the error with the ith record. On each iteration, overfitting cannot happen because the test data is not part of the training set. Allows using all the data for both training and testing. Does not always work perfectly.

Question 10

Q

How to combine model selection and final evaluation when using cross-validation?

Answer

A

Using cross-validation for the training set for parameter tuning. Test the final model on independent test data. Slight optimistic bias. To combat this: Nested cross-validation.

Question 11

Q

What performance measures are out there for regression and
classification?

Answer

A

Mean square error (or root mean squared error) for regression. 0 for a correct prediction, grows quadratically. Outliers have a large effect. Mean absolute error is another one, not as sensitive to outliers.
For Classification: misclassification rate (1 for error, 0 for correct value). Baseline: majority voter.

Question 12

Q

What limitations does misclassification rate / classification
accuracy have as a performance measure?

Answer

A

A low misclassification rate doesn’t necessarily mean good performance. A very unbalanced data will give a low misclassification rate.

Question 13

Q

What are cost and confusion matrices?

Answer

A

If costs for different misclassifications are different, use a cost matrix. (ie. calling a bad product good is worse than rejecting a good product).
Confusion matrix: Classification results are shown in a table where rows represent true classes and columns, predicted classes.

Question 14

Q

What are true positives, false positives etc., how can one
calibrate classifier to achieve different tradeoffs between them?

Answer

A

false positive - predicted as positive, really negative
false negative is often the most dangerous.
F-score 2(Precision x Recall/Precision + Recall)

Question 15

Q

What is a ROC curve, how to interpret area under ROC curve?

Answer

A

TPR = TP/(TP+FN) - ability to find positives (recall, sensitivity)

FPR = FP/FP+TN
rate of negatives IDd as positive
(1 specificity)
ROC curve, True positives against the False positive rate
best possible curve

Question 16

Q

What is a loss function?

Answer

A

We have a hypothesis h(x) = y’, learned from training data by from a ML algorithm (eg. knn, linear model, or decision tree).
Loss (error) function l(y’, y) tells the price we pay for predicting y’ when the true value is y.
What is the expected loss of h on new data?
For example l(y’, y) is 1 if y’ <> y, otherwise 0

Question 17

Q

Misclassification rate

Answer

A

An average zero-one loss over a data set. If 10% is classified incorrectly, the misclassification rate is 0.1. (0.9 would be the classification accuracy).

Question 18

Q

Estimating the generalization error.

Answer

A

The training set estimate is unreliable. Standard approach: randomly assigned to training and test data. Also averaging several estimates.

Question 19

Q

Estimators of generalization error.

Answer

A

Training set error (resubstitution error)
Test set error (aka hold-out error)
Cross-validation error (leave one out CV, k-fold CV)
Bootstrap error (not recommended)

Question 20

Q

Optimistic bias

Answer

A

Systematically estimating the generalization error to be smaller than it really is. E.g: training set error.

Question 21

Q

Pessimistic bias

Answer

A

Systematically estimating the error to be larger than it actually is. Large test set merged back to training set before final training.

Question 22

Q

Training set error

Answer

A

Also called resubstitution error. Has a high optimistic bias since the model was chosen to fit training data. In model selection favors the most complicated models. Don’t use it!

Question 23

Q

Profs advice: Never, ever report your model’s performance on its…

Answer

A

training data.

Question 24

Q

Test set error also known as…

Answer

A

holdout estimate

Question 25

Q

How to create a test set?

Answer

A

Randomly assign e.g. 70 to training set and 30% to test set.

Question 26

Q

Overfitting to test set

Answer

A

The test set is only unbiased if used for testing a single hypothesis. Using the test set for model selection multiple times will cause the test set to be biased. Solution: use two test sets, one for hypothesis testing and one for model selection. (50% fitting, 25% comparing models, 25 for final model evaluation).

Question 27

Q

What is stratification?

Answer

A

For classification, do a random split for each class separately to guarantee a similar distribution of classes in each set.

Question 28

Q

Tricks of the trade for validation.

Answer

A

Use stratification for classification (possible but more complicated for regression).
Never assume data is in random order!

Question 29

Q

K-fold Cross-validation.

Answer

A

The same basic idea as leave one out cross-validation, but instead of leaving one record for validation, k records are used (usually 5 or 10). Data is randomly divided into k folds, with each iteration one fold if left out and the rest is used for training. Leave one out is a special case of k-fold cross-validation. The actual mode is trained with the entire data set. Much faster than leave one out CV.

Question 30

Q

K-fold cross-validation analysis

Answer

A

Faster than leave-one out. Slight pessimistic bias. Not completely deterministic: some variation depending on the random fold split.

Question 31

Q

When would you use stratified k-fold cross-validation?

Answer

A

Imbalanced stratification problems. To guarantee that each fold has roughly the same class distribution as the full set.

Question 32

Q

How to reduce variance in K-fold cross-validation?

Answer

A

M times K-fold cross-validation. Do fold splitting M times (e.g. M=1000) and take the average. Again computationally expensive.

Question 33

Q

IID assumption

Answer

A

Data is assumed to be independently sampled and identically distributed. Doesn’t always hold. For ex. with image processing, you might have 10 versions of the same image with different lighting, etc. Or spatial data/time series data.

Question 34

Q

Precision vs Recall

Answer

A

Precision = TP/(TP + FP) - the proportion of relevant results from what is returned
Recall = TP/(TP + FN (positives you didn’t find) - proportion of relevant result found

Question 35

Q

What kind of test gives perfect recall?

Answer

A

if you return positives for every result. TP or FP. Ie. everyone has covid. There are no false negatives. Or the search engine that returns the entire internet. It doesn’t make sense to just report precision or recall by themselves.

Brainscape's Knowledge GenomeTM

Model Selection and Cross Evaluation Flashcards

Brainscape's Knowledge Genome^TM