Week 6: MCC-ROI analysis Flashcards

1
Q

Definition of the p-value

A
  • P(T >= t | H0); the probability of finding a result as extreme or more extreme than the observed statistic
  • Expresses the surprise of observing the data if the null hypothesis was actually true
  • Can only be used to refute H0 and doesn’t provide evidence for the truth of the H0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Power

A
  • The chance that a testing procedure correctly rejects H0 when there is a true effect; percent of TP that we will detect; one minus the Type II error rate
  • Varies as a function of: 1) size of the true effect; 2) the efficiency of the statistical procedure; 3) and the sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Type I error

A

the probability of rejecting the null hypothesis given that it is true. The test is designed to keep the type I error rate below a prespecified bound called the significance level, usually denoted by the Greek letter α

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Type II error

A

the probability of failing to reject the null hypothesis when it is actually false.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Expected number of false positives given the H0

A

e.g., if our alpha level is set to 0.05, then we are implying that it is acceptable to have a 5% probability of incorrectly rejecting the true null hypothesis
(think of the normal distribution of thr H0 and of the little tail on the right, that means that we are taking that amount of risk of committing a type I error > same reasoning for type II error, but think of Ha distribution)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

High sensitivity leads to…

A

…few false negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Low specificity leads to…

A

…many false positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Levels of inference

A
  • voxel level
  • cluster level
  • peak level
  • set level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Voxel level

A

testing each individual voxel for significance and retaining the ones above a certain threshold (think of y-axis threshold); gives best spatial specificity IF the threshold is picked correctly; we can say something about a specific voxel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cluster level

A
  • takes into account the spatial information available in the images, by finding connected clusters of activated voxels and testing the significance of each
  • has two stages: 1) defining clusters given a width and 2) retaining those clusters according to another threshold
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why would we generally expect the fMRI signal to be spatially extended?

A

Because 1) the brain regions that are activated in fMRI are often much larger than the size of a single voxel and 2) fMRI data are often spatially smoothed, which results in a spreading of the signal across many voxels in the image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Using the cluster level inference gives better…, but also worse spatial … . Picking a low cluster retaining threshold will cause a …, while picking a very high threshold will cause …

A

sensitivity;specificity;very large cluster that encompasses most of the brain;way less and smaller clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When a 1,000 voxel cluster is marked as statistically significant, all we can conclude is that…

A

…one or more voxels within that cluster have evidence against the null. This is not a problem when cluster sizes are small, but it is if on the contrary we have a big cluster size; the only solution is to increase the retaining threshold, but this is not scientifically sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Peak level

A

similar to cluster level, so first we define the clusters (e.g., given their width) but then here we retain those clusters that go above a certain peak

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Set level

A

asks “is there any significant activation anywhere”?; it tests whether anyhwhere across the whole brain there is any activation, but cannot localise the activation; it is an omnibus test; has no localizing power whatsoever

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Types of error rates (+ error rate definition)

A

Error rate = a measure of the degree of prediction error of a model made with respect to the true model.
* per comparison error rate (PCER)
* family-wise error rate (FWER)
* false discovery rate (FDR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Multiple comparison (MC) problem

A

if a statistic image has 100,000 voxels, and we declare all voxels with P < 0.05 to be “significant,” then on average 5% of the 100,000 voxels – 5,000 voxels – will be false positives. This problem is referred to as the multiple testing problem and is a critical issue for fMRI analysis.

Standard hypothesis tests are designed only to control the ‘per comparison rate’ and are not meant to be used repetitively for a set of related tests.

To account for this multiplicity, we have to measure false positive risk over the number of tests we will conduct; error rate * the number of independent tests (voxels) we are going to conduct

18
Q

PCER

A
  • Individual voxel’s error rate; for each voxel independently, the % of null pixels that are FP; the probability of a Type I error in the absence of any multiple hypothesis testing correction.
  • example: if 10 significance tests were each conducted at 0.05 significance level, then the per-comparison error rate would be 0.05
  • by ensuring that PCER <= alpha, the probability of making one or more errors for a single voxel is controlled by alpha
19
Q

FWER

A
  • the probability of making at least one Type I error among a family of comparisons.
  • increases as the number of comparisons increases, because the chance of getting a false positive becomes higher.
  • in this case, the alpha level represents the amount of risk we take in accepting false positives throughout the whole map
  • by ensuring that FWER <= alpha, the probability of making one or more errors in the family is controlled by alpha
20
Q

Ways to deal with the MC problem

A
  • Bonferroni correction
  • RFT correction
  • Parametric correction
  • Non-parametric correction
21
Q

Difference between FWER and FDR

They refer to two different things!

A

FWER tells us the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests; FDR tells you the expected proportion of significant tests (not of all tests) that will be Type I errors.

Put another way, where a level 0.05 FWE procedure is correct 95% of the time – no more than 5% of experiments examined can have any false positives - a level 0.05 FDR procedure produces results
that are 95% correct - in the long run the average FDR will be no more than 5%.

FDR’s greater sensitivity comes at the cost of greater false positive risk

22
Q

Important questions

A

What statistics we are working with & what error rate we are controlling?

23
Q

Spatial smoothing can cause…

A

…correlated data

24
Q

Classic way to deal with MC

A

Bonferroni correction; adjusts the alpha value by dividing it by the number of independent tests

25
Q

BUT, due to spatial smoothing, it is hard to assess the amount of independent things / voxels we deal with…this makes Bonferroni…

A

fail.

26
Q

Why does more smoothing of the fMRI signal increase the chance of false positives?

A

Smoothing is intended to improve the signal-to-noise ratio (SNR). If there is less noise, there will be fewer false positives. However, if the smoothing is excessive, the size of clusters can be exaggerated leading to more false positives. So, we need to strike a balance with enough smoothing to improve SNR, but not so much that we are spreading activation. The expected size of the clusters is typically a good guide to determine the right amount of smoothing. The range of smoothing using in fMRI analysis is usually quite restricted as well. It’s rare to find spatial smoothing kernels outside of the 4-8 mm range.

27
Q

RFT

A
  • method that accounts for the spatial dependence between voxels when correcting for MC (and hence adjsuting our p-value) > post-hoc test!
  • accounts for the degree of smoothness in the data
28
Q

In RFT, smoothness is defined by…

A

…[FWMH_x, FWMH_y, FWMH_z]; this is NOT the size of the Gaussian kernel applied, but the the combination of the intrinsic and applied smoothing.

29
Q

RESEL

A
  • a virtual voxel of size FWHM_x × FWHM_y × FWHM_z
  • the number of resels is given by: V / size of the smoothness parameter
30
Q

In RFT

Through the smoothness and resels parameters, we can adjust the p-value to account for the smoothness in the data: how?

A

there is a formula in the book; this formula shows that, for a given statistic value t and search volume V:
1. as the product of FWHM’s increase (hence more smoothing), the RESEL count R decreases and so does the corrected P-value, producing increased significance; the intuition is that greater smoothness means there is a less severe multiple testing problem, and a less stringent correction is necessary.
2. as the search volume in RESELs grows, so does the corrected P-value, producing decreased significance for the same statistical value. This should also make sense, as the larger the search volume, the more severe the multiple testing problem.

31
Q

RFT limitations

A
  • Requires estimating parameters
  • Must be sufficiently smooth: if you do not spatially smooth the data enough, RFT will not work
32
Q

Parametric tests

A

making parametric assumptions about the data to approximate P-values

33
Q

Non-parametric tests

A

use the data themselves to obtain empirical null distributions of the test statistic of interest and estimate the FWE-corrected p-values

34
Q

Permutations tests

A
  • Using just a single voxel, suppose you have two groups of ten subjects, high performers (H) and low performers (L), each of whose BOLD response data you wish to compare. Under the null hypothesis of no group difference, the group labels are arbitrary, and one could randomly select ten subjects to be the H group, reanalyze the data, and expect similar results.
  • principle of the permutation test: repeatedly shuffling the assignment of experimental labels to the data, and analyzing the data for each shuffle to create a distribution of statistic values that would be expected under the null hypothesis
35
Q

How to account for the MC with permutation tests and obtain the FWE-corrected P-values?

A
  • A FWE-corrected P-value is found by comparing a particular statistic value to the distribution of the
    maximal statistic across the whole image > why?
  • With repeated permutation a distribution of the maximum statistic (for voxels, the highest intensity, for clusters the biggest size) is constructed, and the FWE corrected P-value is the proportion of maxima in the permutation distribution that is as large or larger than the observed statistic value
36
Q

ROI analysis

A
  • reduces the stringency of the
    correction for multiple testing.
  • it is crucial that the ROI is defined
    independently of the statistical analysis of interest.
37
Q

Circularity

it refers to ROI analysis methods

A
  • the issue lies in trying to derive the magnitude of an activation effect using a ROI derived from the same data.
  • arises when one selects a subset of noisy variables
    (e.g., voxels) from an initial analysis for further characterization
  • When a voxel exceeds the threshold (and is thus selected for further analysis), this can be due
    either to signal or to noise
  • In the case in which there is no signal, the only voxels
    that will exceed the threshold will be the ones that have a very strong positive noise value.
  • If we then estimate the mean intensity of only those voxels that exceed the threshold, they will necessarily have a large positive value; there is no way that it could be otherwise, since they were already selected on the basis of exceeding the threshold
  • In the case where there is both true signal and noise, the mean of the voxels that exceed threshold will be inflated by the positive noise values, since those voxels with strong negative noise contributions will not reach threshold
  • Thus, the mean effect size for voxels reaching threshold will over-estimate the true effect size
38
Q

Forward inference

A

what brain activity/region is associated with a
given experimental condition

39
Q

Reverse reference

A

what cognitive process/behavior etc. is
occurring given the brain activity.

40
Q

Double dipping (another way to refer to circular analysis)

think of training vs test set

A

the use of the same dataset for selection and selective analysis, will give distorted descriptive statistics and invalid statistical inference whenever the results statistics are not inherently independent of the selection criteria under the null hypothesis

41
Q

Validity

A

Internal validity: Degree to which the observed effect can be related to a cause (or could there be an alternative cause).
Construct validity: Are we measuring the supposed
underlying construct.
External validity: Do my results generalize?

42
Q

Reliability

A

Degree to which it is possible to replicate reported results