Lecture 5 Flashcards
Describe the dead salmon experiment
Dead salmon experiment
The brain of a dead Atlantic salmon was scanned in a MRI machine. The task administered to the salmon involved completing an open-ended mentalizing task. The imaging data was analyzed by a standard fMRI analysis pipeline
Result: “Several active voxels were discovered in a cluster located within the salmon’s brain cavity”
Of course, we should not expect any voxel that ‘activates.’ Shows that the stats used to analyze the images need to be corrected and thus we need to be more conservative when creating our decision rules
While the probability of a false positive at any single voxel was acceptably low, the probability of a false positive within the brain as a whole—with its many voxels—was not
What is the issue with multiple hypothesis
When testing for multiple hypotheses, (seemingly) significant results can be observed.
It implies that, when testing for multiple hypotheses, the decision rule p < 0.05 might be too loose and we might need a more conservative decision rule than 0.05.
What is FWER
Type 1 error rate works with a single hypothesis, but Family-wise error rate (FWER) works with testing multiple hypotheses - generalization of type 1
When there are multiple hypotheses in total, FWER is the probability of making at least one false positives
it guards against any false positives across all tests
If controlled well, then the high chance of false positive is very reduced
What are the two methods of FWER
Method 1. Bonferroni correction: If you aim to achieve 5% FWER, reject each hypothesis with p < (0.05 / # of tests)
Dependent on the number of tests, very simple, and theoretically guaranteed to control FWER at 5%.
Can be conservative - p of 0.001 becomes 0.05 when there’s 50 tests. This is a very conservative initial p value, but this is appropriate for considering the impact on the whole brain.
It explains why we observed active voxels in the dead salmon experiment – it was ‘too loose’ and ideally applicable when there were 50 voxels only! (as 0.001=0.05/50)
Method 2: Random field theory
Neuroscience researchers seek to use better methods in controlling FWER as Bonferroni would be ‘too conservative’.
In fMRI literature, people typically used a threshold determined by random field theory (RFT). It is based on a mathematical theory on smooth images
Idea: Setting a threshold based on RFT would be less conservative than Bonferroni when working with neuroimaging data
RFT-based FWER control is implemented in popular neuroimaging software (e.g., FSL, SPM, AFNI)
This method doesn’t work - it has a higher false positive rate (higher than 5%)
Based on the chart, what are the 4 possible categories for the m hypotheses.
When m hypotheses are tested in total, each of them will fall into one of four possible categories:
A voxel does not activate and you didn’t reject the null hypothesis ( U )
A voxel does not activate but you rejected the null hypothesis (V ) - false positive
A voxel activates but you didn’t reject the null hypothesis (W ) - type II error
A voxel activates and you rejected the null hypothesis ( S )
What is FDR?
Ideally, we want to minimize the number of false positives (V)
It tries to control expected false positive rates among hypotheses you rejected. (V /R)
If FDR is controlled at 5%, it implies that approximately 5% of the statistically significant voxels are expected to be false positives .
It is more ‘lenient’ than FWER, but at the expense of the number of false positive findings controlled at some degree
Describe the high correlation in neuroimaging data
Neuroimaging data is noisy, with low reliability. Behavioral data is also noisy (e.g. reaction time/accuracy)
(But somehow) the correlation between two noisy measures is often more than 0.8 (???). Even with n < 25 (in most studies)?
Passing a stringent threshold is not easy. However, it can still happen by 5% chance. And existing software often fails to control FWER well ( cluster failure ).
If it passes a threshold by chance, the reported sample correlation of ‘survived’ features will be (seemingly) very high, even though the population-level correlation does not exist.
Be aware that the sample effect sizes would be severely inflated when adjusting for multiple comparisons.
Combined with publication bias, the effect sizes for brain-behavior association could have been greatly inflated