week 5 - model evaluation Flashcards
what is ‘overhyping’
Overhyping involves repeatedly revisiting the test set
This is another form of data leakage
What is permutation testing?
You randomise labels on the train and val sets (but keeping the same predictive data)
You then re-run the analysis and calculate performance scores many times for a null distribution, hoping that you would not be able to classsify because youve destroyed the mapping.
You can then test for a signficant difference between the null distribution and the real performance, to obtain a p-value
Why is it important to evaluate models over a range of measures?
Because if a single measure is the target, it can be gamed.
E.g, non-significant z-scores are not often published
how can you mitigate variability in the dataset?
You can use stratification against potential confounds e.g age and gender
Or you could regress out confounds, by using a regression of the control and retaining residuals (the portion of the data that is free from the confound variable). However, if relationships are weak this can reintroduce penalties onto your data.
You can also use specialised methods e.g COMBAT to remove the effects of scanner site
How to ensure data is representative
Is your dataset representative in terms of demographics?
Are you data folds representative? Have you applied stratification
How can you evaluate whether a model generalizes to new data?
Internal validity = Good performance in new and unseen data, from a similar source to the train and val set
External validity = generalization to data from other sources, e.g from a different scanner