Computational Statistic Flashcards
Validation set method
Split Data into training and test set, fitting the model on the training set and calculating MSE on the validation set.
Haldout Method
Perform validation set method several times and choose the model with the best validation error
Validation error
The perdiction error calculated on a test set
Validation set disadvantes
- Validation set unriable without much data
- Validation error highly depended on initial randomness of validation sample
LOOCV
Leave one out Cross Validation
Leave one out Cross Validation
- Train model n times each point being left out once
- Calculate each models test error on the left out point
- Report mean error
Validation set method cost
Cheap
LOOCV cost
Expensive
K-Fold Cross validation
- divide data into k datasets, for each leaving out a small part as validation set
- train model on eack of the k training sets and measure error on validation set
- report average mse
Bias of K-Fold validation error
- validation error of K-Fold is too optimistic (because the model with the best error is selected)
Nested K-Fold Validation
Select model with K-Fold and report error of selected model on test set
Temporal Data
Be carefull not to include data from any point leter than what the model should predict
Sub selection
Try different subsets of features and seöect the subset with the best validation error
Feature
Input variables
Dimensional Reduction
transform features into smaller feature spaces