Chapter 4- Experimental Methods 1 Flashcards
what is model selection?
picking the best from a pool of possible models
what is cross validation error?
average the errors that happened in each fold
what makes a model more “stable”?
a lower standard deviation
what is loocv
leave one out cross validation- the number of folds is the same as the number of examples
is the cross validation error a good estimate of future generalisation error?
no, it is an optimistic estimate.
how to choose a model from a pool and get a good estimate of future generalisation error from cross validation?
split the data into folds
keep the last fold as a hold out set
perform cross validation on the remaining folds
select the model that performs best on these
evaluate the model on the hold out set
why do we perform feature scaling?
speeds up gradient descent by avoiding many extra iterations that are required when one or more features take on much larger values than the rest
what are two methods of data normalisation
zero mean, unit variance
restrict range
give the equation for zero mean, unit variance normalisation
(x - x_mean) / sigma
give the equation for restrict range normalisation
- (x - x_min) / (x_max - x_min)
in scikit learn the two parameters we use to define convergence are
tol
max_iter
two resampling methods for class imbalance are
undersampling
oversampling
what is the main method of oversampling for class imbalance
data augmentation, smote
what are the two methods of dealing with missing data
data imputation
remove the row
what are the three methods of data imputation for missing data
mean imputation
regression
multiple imputation