Lecture 11: Group Level Analyses and Statistics Flashcards
In most studies we want to be able to say something general
about our whole sample of participants
In most studies, we want to be able to say something general about our whole sample of participants
group level analyses in sensor space
to do this we can
average across individuals
In most studies, we want to be able to say something general about our whole sample of participants
group level analyses in sensor space
To do this we average across individuals
This helps us to
reduce noise and get a clearer picture of the brain’s response, and visualise our effects
In most studies, we want to be able to say something general about our whole sample of participants
To do this we average across individuals
This helps us to reduce noise and get a clearer picture of the brain’s response, and visualise our effects
We also conduct in group level analyses in sensor space
statistical tests to make comparisons, e.g., between conditions of an experiment.
We can do this with the different types of results we have, both in sensor space, e.g., below..
Group level analyses in sensor space is easy than group level analyses in source space as
sensor 1 is same across participants
We can do group level analyses in (2)
- Sensor space
- Source space
We can do group level analyses in source space
Diagram of group level analyses in source space across participants- (3)
- Activity map at a single frequency –> alpha
- Activity map at a single timepoint
- ROI time course
We can do group analyses in source space where we can also
OR oull out ROI (Scout) time coruses and do statististics on these values
Group level analyses must be transformed into
shared MNI space
Trasnforming MEG source data into group soace is useful as
able to average source localised data across participants in a common coordinate space (e.g., MNI or a group-averaged brain)
How does it work by transforming MEG source data into group space?
Works by inflating each hemisphere and then aligning them to a template (easier to align spheres than folding patterns)
Trasnforming MEG source data into group space allows us to do
This allows us to do group-level visualisation and statistics
in source space
Group level statistics our statistical tests might be - (2)
- Parameteric
- Non-parametric
In parametric there is in terms of assumptions
stronger assumptions, including normally distributed data)
Parametric tests for group level statistics is ( 2)
Usually t-tests
Statistical significance is then calculated based on the distribution of the test statistic
Group leve statistics for non-parametric that have fewever assumptions (3)
Including non-parametric versions of standard tests e.g., Mann Whitney U-test
Safer as neuroimaging data may not be normally distributed (esp when doing many tests)
But less power
For group level statistics instead of parametric or non-parametric we can do a resampling based approach (2)
Includes permutation and boot-strapping methods
Avoids assumptions about the form of the data, and is therefore non-parametric and quite robust
In resmapling based approaches for group-level statistics - (7)
Say we want to compare values for conditions A and B
We calculate our test statistic as normal, e.g., a t-statistic
But then we build a null distribution using our data
On each resampling iteration, we scramble the group membership of the data, and recalculate the test
The distribution of these resampled statistics will still center on 0 (i.e., no difference) but doesn’t have to be parametric - normally distributed
We calculate a p-value by comparing our original t-statistic to the distribution of the resampled t-statistics
Increasingly popular but much slower to run
How do we apply the group-level statistics?
- Test is repeated independently at each time point/sensor/location/freuency
How to apply group-level statistics
The test is repeated independently at each time point/sensor/location/frequency
E.g., When compare the time course of activity in two conditions + a single condition - (3)
In each participant, subtract the time courses for each condition
Compare the difference to 0 with a t-test at each time point at the group level
For a single condition, simply compare to 0 without a subtraction
How to apply group-level statistics
The test is repeated independently at each time point/sensor/location/frequency
E.g., for sensors - (3) same as each time point
Same for topographies/current density maps across the whole head
Calculate difference and compare to 0 or compare single condition to 0
Do t-tests across the group at each sensor/vertex
How do we apply group level statistics for multivariate analyses - (2)
For multivariate analyses, decoding accuracy over time would be tested with a t-test vs. chance at each time point
Time-frequency plots in two conditions would be compared with a t-test at each time and frequency pair
Problem time for neuroimagning data - (7)
Neuroimaging data are ‘big’ – they involve measurements at
many spatial locations and points in time
In an MEG study we might easily have something like (248 sensors, 1500 time points, Perhaps six frequency bands, or 50 or 100 different frequencies)
If we multiply these together, we are quickly making millions of comparisons!
That is 375,000 t-tests in an event-related sensor space analysis
Potentially even more in source space (e.g., 15,000 vertices * 1,500 time points * 6 frequency bands)
And that’s just for a single condition – we have three conditions and three contrasts between them
Could be 6 contrasts * 15,000 vertices * 1,500 time points * 50 frequencies…
MEG multiple comparison problem - (7)
The more tests we run, the greater our chances of getting a false positive (also called a Type 1 error)
This is where we get a significant effect, even though there is no true difference
Likelihood is known as the familywise error rate, given by: FWER = 1 - (1 - ⍺)m
Where ⍺ is the threshold for significance (e.g., 0.05) and m is the number of tests
This plot is just for 100 tests and the FWER approaches 1
Clearly, we will end up with some false positives if we have 1000 or 1 million comparisons
We have no way to tell which effects are real and which are errors
How to solve MEG multiple comparison problem - (3)
Bonferroni correction (Adjust the threshold for significance)
False discovery rate correction (Accept that we will make errors, but control their rate)
Cluster correction (Take into account correlations across space/time/frequency)
In bonferroni correction - (6)
We adjust the threshold for significance (⍺) by dividing it by the number of tests
If we have 8 tests, ⍺ = 0.05/8 = 0.0063
Count a test as significant if its p-value is lower than this
This keeps the familywise error rate at or below ⍺ (i.e., still 5% likelihood of a false positive)
Very conservative
Dramatically reduces our ability to detect true effects (aka power) so we need a very large sample size
In false discovery rate (FDR) - (5)
Less conservative – more power
The familywise error rate is the proportion of false positives expected across all tests
The false discovery rate is the proportion of false positives expected across all significant tests
We fix the FDR at a known level (e.g., 0.05), meaning we accept that there will be some number of false positives
But we don’t let them go crazy e.g., 5% of all our significant results might be false but no more
How does FDR work? - (5)
We rank order the list of p-values from all tests
Set the FDR e.g., 0.05, and use to make different significance thresholds for the lowest p-value, and the second lowest, etc.
Compare each p-value to the significance threshold for that list position
our smallest p values judged against harsher threshold
Our biggest p values are judged against a lazer threshold
In MEG cluster correction is different from FDR and bonferroni as they assume .. which is not the case.. (3)
our different tests are independent, and they are not (at all!)
Real effects are likely to extend over several contiguous time points, frequencies, sensors, or brain locations
Also our sampling is arbitrary – we can sample the cortex with 1500 vertices or 150,000 (same goes for time and frequency)
In MEG cluster correction it - (4)
- Cluster correction identifies ‘clusters’ = contiguous samples that are all individually significant without correction
- Controls the false positive rate at this cluster (not individual test) level
- Takes into account the correlations across space, time and frequency inherent in MEG signals
- Gives better statistical power
Steps of cluster correction - identyfing clusters - (2)
- Perform a test at each sample (usually a t-test)
- Identify clusters of adjacent significant tests
After identifying clusters in MEG cluster correction need to
choose significant clusters
Steps of choosing significant clusters - (4)
- In each cluster, sum the t-statistic across all samples
- Find the cluster with the largest summed t-statistic
- Generate a null distribution (the blue histogram) by resampling the largest cluster’s data with random condition labels – on each resample recalculate the summed t-statistic
- Compare each cluster to this null distribution Those with a summed t-statistic falling outside the 95% limits are retained as significant
Summary of all steps of meg cluster correction - (6)
- Perform a test at each sample (usually a t-test)
- Identify clusters of adjacent significant tests
- In each cluster, sum the t-statistic across all samples
- Find the cluster with the largest summed t-statistic
- Generate a null distribution (the blue histogram) by resampling the largest cluster’s data with random condition labels – on each resample recalculate the summed t-statistic
- Compare each cluster to this null distribution Those with a summed t-statistic falling outside the 95% limits are retained as significant
Pros of cluster correction - (3)
+ Maintains statistical power and doesn’t ‘shrink’ clusters
+ Takes correlations across time/space/frequency into account
+ Non-parametric
Cons of cluster correction - (2)
- It is quite slow because of the resampling
- Can’t make strong claims about the edges of a cluster e.g., the significant start and end times in a time course (as wouldn’t necessarily be significant after correction on their own) i.e., might ‘grow’ clusters
Summary of group level analyses - (6) and statistics
Get group averages to visualise MEG results
Run simple statistical tests e.g., t-tests to analyse MEG data
But run lots of them - one at each time point/ sensor/ location/ frequency
Need to address the multiple comparisons problem
We can change the alpha level or use cluster correction
Also use ROIs, test a priori hypotheses, pre-register, etc.
Which of the following statements about source estimation is FALSE?
A. Sensor space analyses does not require estimating a forward model
B. Timing of changes in brain activity can only be determined in sensor space
C. Source space analyses provide information about where in the brain activity is occuring
D. Inverse model is harder to compute than forward model
A - timing of changes in brain activity can be determined in both sensor and source space analyses
While sensor space analysis provides information from the sensors directly, source space analysis localizes brain activity and can also reveal temporal dynamics. Both methods can provide insights into the timing of brain activity changes, albeit from different perspectives.
Why do we digitise the participant’s head shape before collecting MEG data?
A. To build a detailed 3D model of the participant’s head
B. To check that the head is the right size to fit in the MEG scanner
C. To calculate the location of the participant’s head in the helmet
D. To use in coregistering the data with structural MRI scans
D.
The false discovery rate is:
A. The proportion of false positives across all tests
B. The proportion of true positives across all tests
C. The proportion of false positives across all significant tests
D. The proportion of true positives across all significant tests
C.
In the cluster correction method considered in class, how is the largest cluster defined?
A. The chosen cluster has the largest absolute value of the summed test statistic
B. The chosen cluster has the longest duration and/or largest spatial extent
C. The chosen cluster contains the largest single value of the test statistic
D. The chosen cluster has the smallest summed p-value
A.