Analysis of epidemiological data Flashcards
Data types
- Qualitative (categorical)
a. Binary (equivalent, 2 categories)
b. Nominal (equivalent)
b. Ordinal (ranked) - Quantitative
a. Continuous (measurements)
i. Interval e.g. date (zero doesn’t have meaning)
ii. Ratio e.g. age
b. Discrete (counts)
Statistical inference
Process of drawing conclusions about an entire population based on the information in a sample.
- Precision based methods (confidence intervals)
- Hypothesis testing methods
Confidence interval - definition
In statistical inference, one wishes to estimate population parameters using observed sample data. A confidence interval gives an estimated range of values which is likely (within a certain degree of confidence) to include the unknown population parameter.
- The endpoints of the interval take values that depend on the random sample selected
- If we select another random sample its mean and standard deviation would be different so we would obtain a different confidence interval
- If one calculates 100 confidence intervals based on 100 random samples on the average 95 of them would contain the true value of the mean and 5 of them wouldn’t
- The wider the confidence interval, the less precise the estimate
Confidence interval - calculation
value +/- 1.96*SE
where value is a mean, proportion or ratio and SE is calculated as:
- Proportion: sqrt [p(1-p)/n]
- Odds ratio: sqrt [1/a+1/b+1/c+1/d]
Assumptions:
- Sample is drawn randomly
- Observations within a sample are independent
- Sampled population is normally distributed
Hypothesis testing - definition, method (4)
Hypothesis testing is a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample.
- State the null and alternative hypotheses (e.g. population mean is equal to some value, population mean is not equal to some value)
- Decide on the significance level (alpha)
- Draw random sample from population and calculate test statistic
- Make a decision as to whether to reject or not reject the null hypothesis based on the test statistic
Null hypothesis
The null hypothesis (H0) is a statement about a population parameter, such as the population mean, that is assumed to be true. i.e. burden is on research is to show that there is ample evidence to reject the null hypothesis. If null hypothesis is true, then sample parameter will equal the population parameter on average. Based on outcome of hypothesis testing we reject or don’t reject null (never accept the null since it can’t be proven).
Alternative hypothesis
An alternative hypothesis (H1) is a statement that directly contradicts a null hypothesis by stating that that the actual value of a population parameter is less than, greater than, or not equal to the value stated in the null hypothesis.
p-value
Probability of obtaining a sample outcome (value as extreme as or more extreme than the observed sample parameter), given the null hypothesis is true.
Compared to (pre-defined) significance level, alpha, to decide whether the null hypothesis should be rejected or not rejected.
Type I error
Probability of erroneously rejecting the null hypothesis when it is true. Example: RCT testing new treatment vs standard treatment. We declare the new treatment is more effective, when in fact it is not (challenges status quo with potential for harm if product is introduced - patients wouldn’t receive proven treatment)
Type II error
Probability of not rejecting the null hypothesis when we should. Example: RCT testing new treatment vs standard treatment. We declare the new treatment is not more effective, when in fact it is (missed opportunity, but generally more acceptable since it maintains the status quo and patients still receive proven treatment)
Statistical power - definition, how to increase
Probability of rejecting the null hypothesis when we should. e.g. clinical trial - probability of detecting a difference in outcome between animals receiveing and not recieving a new treatment given one exists.
Can increase power by:
- decrease beta (type II error)
- increasing alpha (e.g. 0.1 instead of typical 0.05) - trade off
- increasing sample size (decreases standard error)
- increase effect size willing to detect
Descriptive statistics - summarizing data by data type (3)
Data can be described according to measures of position (central tendancy) and spread
- Binary/nominal: proportion
- Ordinal: median, IQR (box and whisker plot)
- Continuous: mean, SD (barchart with error bars)
Statistical tests - compare 2 groups (unpaired data)
Continuous data: unpaired T-test
Ordinal data/non-normal data: Mann-Whitney U (non-parametric equivalent)
Nominal (binomial) data: Fisher’s test (chi-square for large samples)
Statistical tests - compare 2 groups (paired)
Continuous data: paired T-test
Ordinal data/non-normal data: Wilcoxon test
Nominal (binomial) data: McNemar’s test
Statistical tests - compare 3 or more groups (independent/unpaired)
Continuous data: One-way ANOVA
Ordinal data/non-normal data: Kruskal-Wallis test
Nominal (binomial) data: Chi-square test