week 5 - model evaluation Flashcards

Question 1

Q

what is ‘overhyping’

Answer

A

Overhyping involves repeatedly revisiting the test set

This is another form of data leakage

Question 2

Q

What is permutation testing?

Answer

A

You randomise labels on the train and val sets (but keeping the same predictive data)

You then re-run the analysis and calculate performance scores many times for a null distribution, hoping that you would not be able to classsify because youve destroyed the mapping.

You can then test for a signficant difference between the null distribution and the real performance, to obtain a p-value

Question 3

Q

Why is it important to evaluate models over a range of measures?

Answer

A

Because if a single measure is the target, it can be gamed.
E.g, non-significant z-scores are not often published

Question 4

Q

how can you mitigate variability in the dataset?

Answer

A

You can use stratification against potential confounds e.g age and gender

Or you could regress out confounds, by using a regression of the control and retaining residuals (the portion of the data that is free from the confound variable). However, if relationships are weak this can reintroduce penalties onto your data.

You can also use specialised methods e.g COMBAT to remove the effects of scanner site

Question 5

Q

How to ensure data is representative

Answer

A

Is your dataset representative in terms of demographics?

Are you data folds representative? Have you applied stratification

Question 6

Q

How can you evaluate whether a model generalizes to new data?

Answer

A

Internal validity = Good performance in new and unseen data, from a similar source to the train and val set

External validity = generalization to data from other sources, e.g from a different scanner

Question 7

Q