Week 11 - Model assessment and validation Flashcards

1
Q

Split-sample validation

A

Split data into two parts: Training set, Test set

80% training, 20% test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cross-validation (CV)

A

Split into K parts
Each subset has a turn to be the test set, while the K-1 subsets are used for training
Collect each test set and evaluate accuracy by averaging errors

Known as K-fold cross validation
Choosing K as 5-10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Leave-one-out cross validation (LOOCV)

A

Cross validation where K = n
where each observation is a subset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Bias-variance trade-of

A

More flexible models:
Can fit better to the data (low bias)
Harder to estimate, fitted models are ‘noisier’ (high variance)

Less flexible models:
Poorer fit to the data (high bias)
Easier to estimate (low variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Overfitting

A

A model fits the training data too closely

Indicator: wide disparity in performance between the training and test sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Underfitting

A

When a model is not flexible enough

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Three-way data split: train/tune/test

A

Train: the training set is used to fit any version of a model

Tune: the tuning set is used to repeatedly test the fitted model, while varying the modelling choices and selecting between different models

Test: the test set is used to estimate the final prediction accuracy of the fitted and tuned model

Cross validation can be done where train/tune as cross validated and test is used once at end to test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly