L06-L07 Parametric Tests Flashcards
Outline the procedure for hypothesis testing.
1) Define the problem.
- No. of groups & which samples were compared?
- Two groups: Independent or paired? Equal variances?
- Outcome of interest?
- Type of data? Normally distributed?
- Two-tailed or one-tailed test?
2) State null hypothesis H0 & alternative hypothesis H1
- H0: there is NO effect/difference
- H1: there is some effect/difference
3) Compute test statistic
4) Find the p-value for computed test statistic
5) Compare p-value for computed test statistic with given significance level (alpha) = 0.05 usually
6) State conclusion
- If p < alpha: Reject H0; result is statistically significant at given significance level
- If p >= alpha: Fail to reject H0; result is NOT statistically significant at given significance level
State the assumptions when using parametric tests.
1) Samples are drawn from normally distributed populations.
2) Variances are the same in all samples compared
State the purpose behind the hypothesis testing of paired samples t-test.
To test the H0 that the mean of the underlying population of differences in values of each pair is zero.
State the assumptions when using paired samples t-tests.
1) Samples are randomly sampled from their populations.
2) The two underlying populations are paired, thus variances will be similar.
3) The population of differences in values for each pair is normally distributed
E.g. of how to write conclusion of paired & independent samples t-test.
Paired:
At a significance level of 0.05, there is a statistically significant mean difference of 0.36 +- 0.41 mmol/L (p = 0.005) between the LDL cholesterol level after a 2-week diet with oat bran cereal (4.08 +- 1.06 mmol/L) and that after a 2-week diet with corn flakes (4.44 +- 0.97 mmol/L).
Independent:
At a significance level of 0.05, there is no statistically significant difference between the mean dissolution rates (measured by % dissolution after 15 min in this study) of the two formulations (57.45 +- 4.76% vs 61.33 +- 5.30%, p = 0.08).
State the purpose behind the hypothesis testing of independent samples t-test.
To test the H0 that the two population means corresponding to the two random samples are equal. (i.e. no mean difference)
State the assumptions when using independent samples t-tests.
1) Samples are randomly sampled from their populations.
2) The two underlying populations are independent.
3) The two underlying populations are normally distributed.
4) The two underlying populations have equal variances.
- If variances are not significantly different (i.e. p >= 0.05 for F test or Levene’s test for equality of variances), use the independent samples t-tests for equal variances.
- If variances are significantly different (i.e. p < 0.05 for F test or Levene’s test for equality of variances), use the independent samples t-tests for unequal variances.
What statistical tests are used to test for equality of variances between two independent samples?
F test or Levene’s test
- F test is more restrictive since it requires normal populations & ONLY 2 groups can be compared.
- Levene’s test is more widely used since it is applicable whether or not the data are normally distributed & can also be used to test equality of variances when > 2 groups are being compared.
What is the assumption behind the use of F test?
Populations from which samples are obtained must be normal.
Explain the purpose behind the use of F / Levene’s test.
Designed to test if (at least) two population variances are equal by comparing the ratio of two sample variances.
- If two samples come from populations that have equal variances, the ratio of the sample variances will be close to 1.
- For independent sample t-tests: Numerator is the sample size of group with larger variance, thus all F values are non-negative
- F test is ALWAYS a one-tailed test!
- For one-way ANOVA: Numerator is the between-group variance, thus all F values are non-negative
State the purpose behind the hypothesis testing of one-way ANOVA.
“One-way” indicates that there is one independent variable of interest.
ANOVA = analysis of variances
- Dependent on estimates of spread or dispersion
To test the H0 that all the population means corresponding to the random samples are equal.
- H0: All the means of the underlying populations are the same.
- H1: Not all the means of the underlying populations are the same OR The means of at least two of the underlying populations are different.
State the assumptions when using one-way ANOVA.
1) Samples are randomly sampled from their populations.
2) The underlying populations are independent.
3) The underlying populations are normally distributed.
4) The underlying populations have equal variances.
- If variances are not significantly different (i.e. p >= 0.05 for Levene’s test for equality of variances), continue with one-way ANOVA.
- If variances are significantly different (i.e. p < 0.05 for Levene’s test for equality of variances), use Welch ANOVA instead.
Why do we need to use one-way ANOVA for comparing more than 2 samples, rather than conducting all possible independent samples t-tests?
To control the overall probability of making a Type I error (i.e. false positive) on the predetermined significance level (alpha).
- i.e. to ensure the overall alpha = the predetermined level (e.g. 0.05)
Assuming we compare the means among 3 groups & we conduct 3 pairwise comparisons using independent samples t-tests, with alpha = 0.05 for each test:
- By multiplicative rule, the probability of failing to reject H0 in all 3 tests when H0 is indeed true (i.e. making the correct conclusion) = (1 - 0.05)^3 = 0.857
- Consequently, the overall probability of rejecting H0 in at least one of the t-tests when in fact, H0 is indeed true (i.e. making a Type I error) = 1 - 0.857 = 0.143 > 0.05
How many sources of variation can occur when the means among more than two groups, which are randomly sampled from underlying populations with equal variances, are compared against? Name these sources of variation.
Total variation is made up of two components:
1) Within-group variation
- Variation of individual values around their population means
2) Between-group variation
- Variation of population means around the overall mean
If between-group variability is large compared with within-group variability in one-way ANOVA, this means the underlying population means are _____.
different