Validation Flashcards

1
Q

What is validation?

A

Checking to see how good a model is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a real effect?

A

real relationship between attribute and response. same in all datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is a random effect?

A

random, but looks like real effect. different in all datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which effects do you fit your model on?

A

Both real and random effects. New data will have fewer correct predictions because it only has the real effects in common with the our fit, not random effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you measure a models performance?

A

Split the data - larger set of data to fit the model and smaller set to measure the models effectiveness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you measure multiple models performance? And why?

A

train/test/validation split
-test the models on the validation split. evaluate the individual model with the test split
-high performing models are more likely to have above average random effects
-if all models were equally good, the only performance we would measure is random effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When do you use a validation and test set and when do you use only test?

A

if you’re chooosing between models choose the best model using validation set. then estimate quality with test set. otherwise estimate quality with test set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How much data goes into each split?

A

1 model
-70 - 90% for training
comparing models
-50-70% for training and split the rest between test and validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is Random splitting data and what is the downside

A

randomly choose the datapoints for each group. could give one set more of a certain type of data than the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

rotation splitting of data

A

take turns selecting points. 1st goes to training, 2nd to test, and 3rd to validation. then repeat. could introduce bias if your data is structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What problem does cross validation solve?

A

important points only show up in the training or test set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is cross validation?

A

split the data evenly. then test each part against a model trained on all others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is k-fold cross validation

A

for each of the k parts:
train the model on all other parts
evaluate it on the one remaining part

avg. the k evaluations to estimate the models quality.

no standard number but k = 10 is common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

With crossfold validation, which model do you pick as your final model?

A

None. You train a final model on all the data whose accuracy is the avg of the k tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly