Week 6: MCC-ROI analysis Flashcards
Definition of the p-value
- P(T >= t | H0); the probability of finding a result as extreme or more extreme than the observed statistic
- Expresses the surprise of observing the data if the null hypothesis was actually true
- Can only be used to refute H0 and doesn’t provide evidence for the truth of the H0
Power
- The chance that a testing procedure correctly rejects H0 when there is a true effect; percent of TP that we will detect; one minus the Type II error rate
- Varies as a function of: 1) size of the true effect; 2) the efficiency of the statistical procedure; 3) and the sample size
Type I error
the probability of rejecting the null hypothesis given that it is true. The test is designed to keep the type I error rate below a prespecified bound called the significance level, usually denoted by the Greek letter α
Type II error
the probability of failing to reject the null hypothesis when it is actually false.
Expected number of false positives given the H0
e.g., if our alpha level is set to 0.05, then we are implying that it is acceptable to have a 5% probability of incorrectly rejecting the true null hypothesis
(think of the normal distribution of thr H0 and of the little tail on the right, that means that we are taking that amount of risk of committing a type I error > same reasoning for type II error, but think of Ha distribution)
High sensitivity leads to…
…few false negatives
Low specificity leads to…
…many false positives
Levels of inference
- voxel level
- cluster level
- peak level
- set level
Voxel level
testing each individual voxel for significance and retaining the ones above a certain threshold (think of y-axis threshold); gives best spatial specificity IF the threshold is picked correctly; we can say something about a specific voxel.
Cluster level
- takes into account the spatial information available in the images, by finding connected clusters of activated voxels and testing the significance of each
- has two stages: 1) defining clusters given a width and 2) retaining those clusters according to another threshold
Why would we generally expect the fMRI signal to be spatially extended?
Because 1) the brain regions that are activated in fMRI are often much larger than the size of a single voxel and 2) fMRI data are often spatially smoothed, which results in a spreading of the signal across many voxels in the image
Using the cluster level inference gives better…, but also worse spatial … . Picking a low cluster retaining threshold will cause a …, while picking a very high threshold will cause …
sensitivity;specificity;very large cluster that encompasses most of the brain;way less and smaller clusters
When a 1,000 voxel cluster is marked as statistically significant, all we can conclude is that…
…one or more voxels within that cluster have evidence against the null. This is not a problem when cluster sizes are small, but it is if on the contrary we have a big cluster size; the only solution is to increase the retaining threshold, but this is not scientifically sound
Peak level
similar to cluster level, so first we define the clusters (e.g., given their width) but then here we retain those clusters that go above a certain peak
Set level
asks “is there any significant activation anywhere”?; it tests whether anyhwhere across the whole brain there is any activation, but cannot localise the activation; it is an omnibus test; has no localizing power whatsoever
Types of error rates (+ error rate definition)
Error rate = a measure of the degree of prediction error of a model made with respect to the true model.
* per comparison error rate (PCER)
* family-wise error rate (FWER)
* false discovery rate (FDR)
Multiple comparison (MC) problem
if a statistic image has 100,000 voxels, and we declare all voxels with P < 0.05 to be “significant,” then on average 5% of the 100,000 voxels – 5,000 voxels – will be false positives. This problem is referred to as the multiple testing problem and is a critical issue for fMRI analysis.
Standard hypothesis tests are designed only to control the ‘per comparison rate’ and are not meant to be used repetitively for a set of related tests.
To account for this multiplicity, we have to measure false positive risk over the number of tests we will conduct; error rate * the number of independent tests (voxels) we are going to conduct
PCER
- Individual voxel’s error rate; for each voxel independently, the % of null pixels that are FP; the probability of a Type I error in the absence of any multiple hypothesis testing correction.
- example: if 10 significance tests were each conducted at 0.05 significance level, then the per-comparison error rate would be 0.05
- by ensuring that PCER <= alpha, the probability of making one or more errors for a single voxel is controlled by alpha
FWER
- the probability of making at least one Type I error among a family of comparisons.
- increases as the number of comparisons increases, because the chance of getting a false positive becomes higher.
- in this case, the alpha level represents the amount of risk we take in accepting false positives throughout the whole map
- by ensuring that FWER <= alpha, the probability of making one or more errors in the family is controlled by alpha
Ways to deal with the MC problem
- Bonferroni correction
- RFT correction
- Parametric correction
- Non-parametric correction
Difference between FWER and FDR
They refer to two different things!
FWER tells us the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests; FDR tells you the expected proportion of significant tests (not of all tests) that will be Type I errors.
Put another way, where a level 0.05 FWE procedure is correct 95% of the time – no more than 5% of experiments examined can have any false positives - a level 0.05 FDR procedure produces results
that are 95% correct - in the long run the average FDR will be no more than 5%.
FDR’s greater sensitivity comes at the cost of greater false positive risk
Important questions
What statistics we are working with & what error rate we are controlling?
Spatial smoothing can cause…
…correlated data
Classic way to deal with MC
Bonferroni correction; adjusts the alpha value by dividing it by the number of independent tests