HYPOTHESIS TESTING - LEARNING OUTCOMES Flashcards
What are the general measures we would look out when studying continuous data with a normal distribution?
We would look at the mean and standard deviation followed by the mean difference between the two groups - preferably with the associated 95% confidence interval.
What does hypothesis testing essentially test for?
Hypothesis testing calculates the likelihood that the difference we are seeing between two groups actually happened by chance and that the null hypothesis (no difference) is actually true. How likely it is that we would get these results if there is actually no difference between our two groups?
Summarise hypothesis testing.
We start by specifying the study hypothesis and the null hypothesis. We assume the null hypothesis is true and proceed to calculate the probability of getting the observed difference by chance - this is what is termed the p-value
What does a small p-value indicate?
A small p-value implies that there is a statistically significant difference between our two groups. In this case we can reject the null hypothesis.
What does a large p-value indicate?
A large p-value tells us that there is no evidence of a difference between the groups. In this case we would fail to reject the null hypothesis (not the same as saying we accept the null hypothesis, there may still be an effect there but our study is not powerful enough to detect it).
What is accepted as a small p-value?
Convention is to use p=0.05 as a cut off. Less that 0.05 we would term a significant difference. More than 0.05 we would say there is no evidence of difference. However, 0.05 is not a magic figure. It is better to give the actual figure and let readers make up their own minds.
What is a type I error?
A type I error is to reject the null hypothesis when it is actually true. This is essentially a false positive result. The frequency of this type of error is the same as the alpha level (significance cut off).
What is a type II error?
A type II error is the failure to reject the null hypothesis when it is actually false. This is essentially a false negative result. Type II error is very much dependant on the sample size and power of the study.
How do we decide which hypothesis or statistical test to carry out?
Depends on the type of outcome variable and the number of groups you are analysing. Depends on a number of criteria about the data.
Give an example of a paired sample.
Measurements before and after an event - e.g. heart rate before and after a period of exercise.
What 3 things does the p-value depend on?
- How big the observed difference is
- Sample size
- Variability of measurement
Sample size and variability in measurement are related to each other in terms of the standard error where standard error is the standard deviation divided by the square root of your sample size.
How is the T-statistic calculated?
T-statistic = Observed mean difference / Standard error of the difference between means
Look this value up in tables of normal distribution or use a stats programme to do this for you.
What assumptions are required for a t-test to be applicable?
- The outcome should be continuous and must be following a normal distribution
- The variance of the two groups is equal. SPSS will test this for you automatically using Levene’s test.
What is the first thing we need to look at in the output table for the difference between two means in SPSS?
We first need to look at and interpret the Levene’s test results before we even think about interpreting the output from a t-test.
For the Levene’s t-test the null hypothesis is that there is no difference in variance between the two groups.
Essentially a significant result for Levene’s test means that there is a difference in variance between the two groups.
If the result from the Levene’s test is not significant then we fail to reject our null hypothesis of no difference between variance and we can use the top line of SPSS output.
(Levene’s test more than 0.05 then use top line).
If the result from the Levene’s test is significant then we reject our null hypothesis of no difference between variance and use the bottom line of SPSS output representing the adjusted t-test that hasn’t assumed equal variance.
(Levene’s test less than 0.05 then use bottom line).
Often the two lines of output will give you the same conclusion, but there may be a difference in confidence intervals.
We would want to back up the significance results from a t-test with a measure of effect - what measure of effect would be most likely use for this?
Mean difference with the associated confidence intervals
When our data aren’t drawn from a normal distribution and we therefore can’t use a t-test we can attempt to use data transformation. Describe the general principles of data transformation.
Firstly you can try transforming the data with an algorithm and then carry out analysis on that data. You can’t assume the transformation is going to work and give you a variable that is following a normal distribution - still need to check those assumptions e.g. plot histogram for normality. If the data is ok can then use a parametric test (e.g. t-test_ on the transformed variable.
What sort of data transformations can we try and when are they appropriate?
The transformations we may try are dependant upon what the data looks like. e.g:
moderate positive skew - logarithm
strong positive skew - reciprocal
weak positive skew - square root
moderate negative skew - square
strong negative skew - cube
unequal variation - log, reciprocal, square root
What can we do with continuous data not drawn from a normal population if transforming the data proves to be unsuccessful?
We can use non-parametric tests. These are tests that are designed for data where you don’t have to worry about the underlying distributions.
What are the advantages and disadvantages if using non–parametric tests?
Advantages:
- Make no assumptions about underlying distribution of data
Disadvantages:
- Less powerful than parametric tests
- Difficult to get confidence intervals
How do we describe skewed variables?
If data is not normally distributed then:
- We need to present the medians not the means
- We need to present the range or interquartile range, not the standard deviation
- If we are comparing two groups of non-parametric data then we need to present the difference between the two medians (but can’t easily get 95% confidence intervals)
(Median and IQR)
What is the non-parametric equivalent of the t-test?
The Wilcoxon rank sum test or the Mann-Whitney U test.
These tests are appropriate if you have two independent groups with a continuous variable that isn’t following a normal distribution.
How does the Wilcoxon rank sum test work?
It ranks the data and then works on the rank of the data rater than the raw variables themselves. Most non-parametric tests work on rank.
For example:
We have two independent groups group 1 and group 2 where group 1 is the smallest sized group.
We rank all observations into ascending order.
The sum of ranks for group 1 = test statistic T.
Look up T on Wilcoxon rank sum table of critical values to get P value.
In practice you would do this using SPSS.
What is the Kruskal-Wallis test?
Another non-parametric test. It is an extension of the Mann-Whitney test for use when you have more than 2 groups to compare. It is the non-parametric equivalent of the ANOVA.
Use it when you have a continuous, skewed outcome variable with more than 2 independent exposure groups.
What is the Spearmans correlation coefficient?
For comparing two continuous variables where at least one of them is not following a normal distribution.
What is the general method of assessing association between two categorical variables?
Chi-squared test
What are we really looking at in a hypothesis test to look at association between two categorical variables?
We are essentially looking at how likely it is that we would get the difference that we have observed (in the odds ratio for example) if the truth was that there was no association between our two variables.
Each percentage in our results table is subject to sampling error. We need to assess whether the differences between them could be due to chance. We conduct a chi-squared test to get a p-value and this p-value tells us how likely the value is to have occurred by chance if there is truly no association.
Describe the stages of the chi-squared test.
- State the null hypothesis - no association between the two variables.
- Calculate the test statistic - how close are the observed values in the table to the values expected were there no true association?
Expected number = row total x column total / overall total
The chi-squared test will compare the expected numbers under the null hypothesis to the numbers we actually got to see if there is a significant difference.
For each cell we then subtract the expected (E) from the observed (O), then square and divide by E:
(O-E)^2 / E
Then sum all cells to give the chi-squared statistic. The larger the value of the chi-squared statistic, the less consistent the data are with the null hypothesis.
- Obtain a p-value - refer value of chi-squared to table of chi-squared distribution.
Degrees of freedom = (number of rows - 1) x (number of columns - 1)
- Interpret the p-value
What is the general formula for calculating the expected numbers under the null hypothesis for a categorical variable results table?
How do we calculate the chi-square test statistic?
Expected number = row total x column total / overall total
The chi-squared test will compare the expected numbers under the null hypothesis to the numbers we actually got to see if there is a significant difference.
For each cell we then subtract the expected (E) from the observed (O), then square and divide by E:
(O-E)^2 / E
Then sum all cells to give the chi-squared statistic. The larger the value of the chi-squared statistic, the less consistent the data are with the null hypothesis.