Validation Flashcards

Question 1

Q

What is validation?

Answer

A

Checking to see how good a model is

Question 2

Q

what is a real effect?

Answer

A

real relationship between attribute and response. same in all datasets

Question 3

Q

what is a random effect?

Answer

A

random, but looks like real effect. different in all datasets

Question 4

Q

Which effects do you fit your model on?

Answer

A

Both real and random effects. New data will have fewer correct predictions because it only has the real effects in common with the our fit, not random effects

Question 5

Q

How do you measure a models performance?

Answer

A

Split the data - larger set of data to fit the model and smaller set to measure the models effectiveness

Question 6

Q

How do you measure multiple models performance? And why?

Answer

A

train/test/validation split
-test the models on the validation split. evaluate the individual model with the test split
-high performing models are more likely to have above average random effects
-if all models were equally good, the only performance we would measure is random effects

Question 7

Q

When do you use a validation and test set and when do you use only test?

Answer

A

if you’re chooosing between models choose the best model using validation set. then estimate quality with test set. otherwise estimate quality with test set

Question 8

Q

How much data goes into each split?

Answer

A

1 model
-70 - 90% for training
comparing models
-50-70% for training and split the rest between test and validation

Question 9

Q

what is Random splitting data and what is the downside

Answer

A

randomly choose the datapoints for each group. could give one set more of a certain type of data than the other

Question 10

Q

rotation splitting of data

Answer

A

take turns selecting points. 1st goes to training, 2nd to test, and 3rd to validation. then repeat. could introduce bias if your data is structured

Question 11

Q

What problem does cross validation solve?

Answer

A

important points only show up in the training or test set

Question 12

Q

what is cross validation?

Answer

A

split the data evenly. then test each part against a model trained on all others

Question 13

Q

what is k-fold cross validation

Answer

A

for each of the k parts:
train the model on all other parts
evaluate it on the one remaining part

avg. the k evaluations to estimate the models quality.

no standard number but k = 10 is common

Question 14

Q

With crossfold validation, which model do you pick as your final model?

Answer

A

None. You train a final model on all the data whose accuracy is the avg of the k tests

Question 15

Q

Validation Flashcards

(15 cards)