Hypothesis testing Flashcards
What is a type I error?
The probability that we will reject the null hypothesis (concluding that there is an effect) when there actually isn’t
A false positive
Thinking you have found gold but it is actually an aluminium can
What is the typical level of type I errors in biological sciences?
0.05 / 5%
What is a type II error?
Failing to reject a false null hypothesis (when there is an effect)
A false negative
A metal detector doesn’t beep when there is gold underground
Test statistic for one-sample t-test
t = difference between measured mean and hypothesised value of the mean / SE of the mean
What does a one-sample t-test do?
Tests whether the mean of the sample of data is equal to a hypothesised mean
Paired t-test
Special case of the one-sample t-test
We have a pair of measurements per subject (before and after for example)
What does a paired t-test do?
Tests whether the mean of the set of differences is equal to zero
Two-sample t-test test statistic
t = difference between out measured difference between 2 means and the difference between 2 means that we expect under the null hypothesis / SE of the difference between 2 means
What does the two-sample t-test do?
Tests if the means of two samples of data are the same or significantly different
Two-sample t-tests assuming equal variances are robust for sample sizes…
> = 30
Per group
If the two groups have _____ in each group, then it is still OK to use the two-sample t-test when the SDs of the groups differ by up to ______ fold
Similar numbers
Three
What is the t-test where equal variances are not assumed?
Welch’s T-test
What is the test statistic for an ANOVA?
F = treatment mean square / error mean square
(explained variance (signal) / unexplained variance (noise))
What does an ANOVA do?
Tests if all the means of more than two groups are the same or significantly differ
What do hypothesis tests involve?
A test statistic
A distribution from which we expect the test statistic to come from if the null hypothesis is true
Calculation of a p-value
What does a p-value tell us?
The probability of obtaining the value of the test statistic, or a more extreme value, if the null hypothesis is true
What is an interaction?
When the effect of one variable on the outcome variable differs depending on the level of another variable
The effect of a drug on blood pressure differs depending on the type of diet they eat
Assumptions of a t-test
Data are normally distributed
Assumptions of ANOVA
The error (residuals) is normally distributed
Variance of the unexplained variation is constant throughout the dataset (homogeneity of error/residuals)
Chi Squared Test statistic
χ2 = sum( (observed - expected)2 / expected )
How do you calculate the expected value in Chi Squared?
Expected = (row total x column total) / grand total
Assumptions of a Chi Squared Test
All expected counts are >= 5
Mean of counts = the variance of the counts
The Poisson distribution can be approximated by a Normal distribution, which is true as long as the mean is >= 5
If all counts aren’t >= 5 in Chi Squared Test, when is it still reasonable to run the test?
As long as all expected counts are >= 1
80% of the expected counts are >= 5
How to check for normally distributed residuals?
Plot histogram of the residuals and check the shape
Plot the normal Q-Q plot of the residuals (the dots should be not too far from the line, shouldn’t bend away from the line in any systematic fashion)
How to check the homogeneity of residuals/error?
Residuals vs fitted values plot
Looking to see that there are no obvious differences between the groups, (they don’t get bigger as the fitted values get bigger for example) scatter of positive and negative residuals in each case
Bonferroni correction
overall p-value / number of tests conducted
This gives a new p-value that you can use as a threshold
Why do we correct our p-value for multiple comparisons?
When we do a statistical test, there is a 5% chance that we will reject a true null hypothesis (get a false positive)
If we conduct multiple analyses, there is a 5% chance each time that this will happen, so our error gets bigger (for 3 tests there is 5 + 5 + 5% = 15% chance that we will get a false positive)
We need to correct for this so that our results still only have 5% error overall
What can we do if our data don’t fit the assumptions of a statistical test?
Transform the data
What types of transformation can we do?
Log the data
Square the data
Square root the data
Non-parametric alternative to two-sample t-test
Mann-Whitney U Test
Null hypothesis of Mann-Whitney U Test
The two groups being compared come from the same distribution, with the same median
It doesn’t matter what the shape of that distribution is
How to conduct a Mann-Whitney U Test
- Convert the values into ranks
- Find the sum of ranks in both samples
- Calculate U for both samples
- Test statistic (U) = minimum of U1 and U2
Assumptions of the Mann-Whitney U Test
The two groups being compared follow the same distribution (if one is left-skewed, the other group should be left skewed)