Chapter 4- Experimental Methods 1 Flashcards

1
Q

what is model selection?

A

picking the best from a pool of possible models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is cross validation error?

A

average the errors that happened in each fold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what makes a model more “stable”?

A

a lower standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is loocv

A

leave one out cross validation- the number of folds is the same as the number of examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

is the cross validation error a good estimate of future generalisation error?

A

no, it is an optimistic estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how to choose a model from a pool and get a good estimate of future generalisation error from cross validation?

A

split the data into folds
keep the last fold as a hold out set
perform cross validation on the remaining folds
select the model that performs best on these
evaluate the model on the hold out set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

why do we perform feature scaling?

A

speeds up gradient descent by avoiding many extra iterations that are required when one or more features take on much larger values than the rest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are two methods of data normalisation

A

zero mean, unit variance

restrict range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

give the equation for zero mean, unit variance normalisation

A

(x - x_mean) / sigma

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

give the equation for restrict range normalisation

A
  • (x - x_min) / (x_max - x_min)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

in scikit learn the two parameters we use to define convergence are

A

tol

max_iter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

two resampling methods for class imbalance are

A

undersampling

oversampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the main method of oversampling for class imbalance

A

data augmentation, smote

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the two methods of dealing with missing data

A

data imputation

remove the row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are the three methods of data imputation for missing data

A

mean imputation

regression

multiple imputation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly