Hypothesis Testing Flashcards
To have a clear understanding of: - Sample and population - Hypothesis testing - P-values and confidence intervals - Testing a hypothesis referring to a single mean and a single proportion - Interpretation of statistical output for each test
Who is included in a population?
Current members and future members.
Why is a sample of a population used?
Because it is not feasible to collect data on the entire population, a sample is used to tell us about the theoretical population.
What is needed before collecting data from a sample?
The theoretical population must be defined in order to understand how generalisable the inferences from the sample are. The sample must be representative of the population.
What are the key considerations for a random sample?
- Each individual from the population must have an equal chance of being included.
- The inclusion of one individual must not affect the inclusion of another.
What is stratified random sampling?
A method in which populations are broken down into smaller divisions and samples taken from these in order to ensure the sample is representative of the entire population.
Why is the normal distribution important in medical research?
- To understand the central tendency and variability of the data to allow for summarising and interpretation of large datasets.
- For use of parametric statistical tests such as t-tests and ANOVA.
- For predictive modelling and risk assessment.
- For quality control in order to set limits and identify outliers.
What is the standard error?
A measure of the variability between sample means.
It quantifies the difference between the mean measured from a sample and the mean measured from the theoretical population.
What is the standard deviation?
A measure of how far the mean is from other points in the dataset.
It is the variability in the population/sample mean.
What does the null hypothesis state?
That there is no difference between populations.
What does the alternate hypothesis state?
That there is a difference between populations.
What do you have to assume about a sample in order to test a hypothesis?
- The sample is representative.
- The sample is independent.
- The sample has homogenous variance.
- The sample is normally distributed.
What can you assume if the sample size is > 30?
That the distribution of the sample mean is approximately normal (no matter the distribution of the data).
When should you use the normal distribution?
If the sample size is > 30 and assumptions are met.
When should you use the t-distribution?
If the sample size < 30 and assumptions are met.
What should you do if assumptions are not met?
First, try to transform the data.
If the transformed data still does not meet the assumptions, use a non-parametric test.
What is the limiting factor for categorical data?
The accuracy of the instrument used to measure the value.
What is categorical data known as if it can only take 2 distinct values?
Dichotomous or binary.
What is a type I error?
A false positive - the null hypothesis is rejected when it is true.
e.g., you say there is a difference between groups when there isn’t one
What is a type II error?
A false negative - the null hypothesis is not rejected when it is false.
e.g., you say there isn’t a difference between groups when there is one.
What is the power of a test?
The probability of not rejecting the null hypothesis when it is false.
e.g., the probability of not making a type II error.
What is the p-value?
A measure of the strength of the evidence against the null hypothesis.
A small p-value indicates that the null hypothesis is unlikely to be true.
What is a confidence interval?
The range of values estimated from a sample within which the true population value is likely to be found.
What does a 95% confidence interval mean?
That if a random sample was drawn 100 times, 95 out of the 100 times the sample would contain the true population parameter.
How do you calculate the 95% CI?
95% CI for the mean is (x - 1.96 x SE) to (x + 1.96 x SE)
What test should you use to compare one group to a known value?
Parametric = One sample t-test.
Non-parametric = sign test.
What test should you use to compare 2 paired groups?
Parametric = paired t-test.
Non-parametric = Wilcoxon signed-rank test.
What test should you use to compare 2 independent groups?
Parametric = unpaired t-test.
Non-parametric = Wilcoxon rank sum test.