Lecture 8 Flashcards
Explain: One of the fundamental principles of statistics is that there is a tradeoff between model assumptions and model performances
If you use tests that make strong assumptions about the population, they will be more powerful if they’re true. But, they will be prone to undesired results (e.g. inflated Type 1 error, low power) if the assumptions are false.
If you use tests that assume less about the population, they will be robust to the issue of inflated Type 1 error but they will be (slightly) less powerful than tests with heavy assumptions
In other words, you need to pay for the cost by making strong assumptions
Some ‘commonly used’ assumptions include:
Some ‘commonly used’ assumptions include:
Independence of samples
Normality of the population-level data
Sample size → ∞ (i.e. sufficiently many)
Constant variance across groups (in two-sample t-test / ANOVA)
Some explicit violations:
Some explicit violations:
A sample with a not-normal distribution may come from a normal population. Additionally, the floor and ceiling effect causes skews in the data. Both of these result in the assumption of normality being violated.
When normality is not met and the sample size is small, there can be inflated or deflated power
Paired t test and independence two-sample t test both test whether the population means are the same. Then why should I use the paired t test for repeated measures?
A two-sample t-test assumes the two groups are independent of each other, while the paired t-test assumes that the two samples are from the same group. This causes the two-sample to assume that there is constant variance in the data, while the two-sample t test can’t. This assumption will increase power when true, if false it will be more prone to type 1 errors. Since the paired t-test doesn’t make this assumption, it will be more flexbile.
Two alternative methods are the permutation tests and bootstrapping
Permutation tests have exact control of Type 1 error rate when distributional assumption is invalid or sample size is small. They also have exact control of family-wise error rate
Bootstrapping provides valid standard errors, confidence intervals and provide more robust power calculation
Permutation
Permutation test uses your data only to generate the null distribution of test statistics, which should be sufficient to compute p values
It is done by ‘ breaking the relationship ’ you seek for based on the hypothesis. If null hypothesis is really true (no difference between population means), then swapping the group labels will not make any difference as well
The null distribution generated by permutation would be similar to parametric (t or F ) null distributions if model assumptions are true.
Permutation test is distribution-free: no assumption on normality or sample size is needed. And guaranteed to control Type 1 error
Can we use glm for permutation?
Since permutation breaks the relationship to get ‘null’ statistics and almost every statistical model is just GLM, we can do an equivalent permutation for almost every test
Permutation test in multiple comparisons
Permutation to control FWER
1. We start with the complete null hypothesis, assuming that all null hypotheses are true.
2. By permuting the IV, we force independence between the IV and each of DVs. At the same time, the orders within DVs are preserved
3. We select the most extreme test statistic
4. Repeat 2 and 3 multiple times to collect the most extreme test statistic for each permutation.
Bonferroni VS Permutation
Permutation guarantees an exact control of FWER, while Bonferronni provides a conservative control of FWER.
When the number of comparisons is moderate, then Bonferroni is similar to permutation
When working with a very large number of comparisons, Bonferroni would give conservative (yet valid) results (This would decrease power)
Permutation requires full-data to find the threshold, while Bonferroni only requires p values for each pair.
Bootstrapping
The shape of the population distribution is unknown, but the sample distribution (not “sampling distribution”) approximates the shape of the population distribution.
In statistics, we assume that each set of samples are drawn randomly from the population, with replacement.
The bootstrap is a procedure that uses the given sample (i.e. your data) to create a new distribution, called the bootstrap distribution, that approximates the sampling distribution for a statistic (e.g., sample mean).
Bootstrap distribution can be used to quantify uncertainty of a statistic (e.g., sample mean, t statistics, etc)
Key idea of bootstrapping: The original sample approximates the population from which it was drawn. So resamples from this sample approximate what we would get if we took many samples from the population. The bootstrap distribution of a statistic, based on many resamples, approximates the sampling distribution of the statistic, based on many samples
Pros and cons of bootstrapping
Pros:
You can use it to determine the probability of observing values of variables that come from any distribution
The lack of assumptions about a variable’s distribution makes the bootstrapped probability estimates more accurate
Far better than trying to pigeon-hole data into a distribution that doesn’t describe them
Minimizes the influence of outliers without trivializing the inferential value of them
Cons:
It is processor- and time-intensive.
It requires individual-level data. Traditional methods only require summary statistics to build CI or compute power.
For example, 95% CI for the mean is ( ̄x − 1.96 × s/√n, ̄x − 1.96 × s/√n) Power is given by effect size, α, sample size
Bootstrapping, on the other hand, requires access to full data (if data privacy is a critical issue)
Permutation vs Bootstrapping purpose and procedures
Permutation
* purpose: test hypotheses with valid control of Type 1 error or family-wise error.
* procedure: permute the orderings to generate ‘null statistics’
Bootstrapping
* purpose: constructing confidence interval
* procedure: resample observation (or ‘residuals’) with replacement