Week 11 - Model assessment and validation Flashcards

Question 1

Q

Split-sample validation

Answer

A

Split data into two parts: Training set, Test set

80% training, 20% test

Question 2

Q

Cross-validation (CV)

Answer

A

Split into K parts
Each subset has a turn to be the test set, while the K-1 subsets are used for training
Collect each test set and evaluate accuracy by averaging errors

Known as K-fold cross validation
Choosing K as 5-10

Question 3

Q

Leave-one-out cross validation (LOOCV)

Answer

A

Cross validation where K = n
where each observation is a subset

Question 4

Q

Bias-variance trade-of

Answer

A

More flexible models:
Can fit better to the data (low bias)
Harder to estimate, fitted models are ‘noisier’ (high variance)

Less flexible models:
Poorer fit to the data (high bias)
Easier to estimate (low variance)

Question 5

Q

Overfitting

Answer

A

A model fits the training data too closely

Indicator: wide disparity in performance between the training and test sets

Question 6

Q

Underfitting

Answer

A

When a model is not flexible enough

Question 7

Q

Three-way data split: train/tune/test

Answer

A

Train: the training set is used to fit any version of a model

Tune: the tuning set is used to repeatedly test the fitted model, while varying the modelling choices and selecting between different models

Test: the test set is used to estimate the final prediction accuracy of the fitted and tuned model

Cross validation can be done where train/tune as cross validated and test is used once at end to test

Week 11 - Model assessment and validation Flashcards

(7 cards)