Stats Flashcards
Tests
The unpaired t-test is best for comparing the mean difference between 2 independent groups where the data is continuous and normally distributed.
Paired t-test would be appropriate where the 2 groups are matched, for example pre-op and post-op.
The Mann–Whitney U test is best used to compare the medians of 2 independent groups where the data is continuous but not normally distributed.
Wilcoxon signed-rank test is used to compare the medians of 2 matched groups where the data is continuous but not normally distributed
McNemar’s is appropriate for assessing the difference in proportions of categorical variables between two matched groups.
The 1-way ANOVA is used to compare 3 or more sample teams where the data is continuous and normally distributed. It can also be used to compare 2 sample means, however the t-test provides this function.
The Kruskal–Wallis test is used to compare 3 or more independent groups where the data is continuous but not normally distributed. It is often considered the nonparametric equivalent of the 1-way ANOVA.
Case control studies
Positive features of case-control studies include:
- relatively quick and cheap to perform
- efficient for rare diseases
- useful for diseases with long latency
Negative features include:
- recall bias
- selection bias on controls
- inefficient for rare exposures
- the temporal sequence of events is unclear
Bias
Selection bias occurs due to systematic differences between baseline characteristics of the groups that are compared. Randomisation with concealment aims to ensure the baseline characteristics are equal.
Performance bias refers to systematic differences in the way the two groups are treated other than the intervention of interest.
Attrition bias refers to systematic differences in the characteristics of participants who withdraw from the study.
Detection bias occurs where there are differences in the way outcomes are determined.
Reporting bias refers to systematic differences in reported and unreported findings.
Error
A type-2 error is a false negative where the null hypothesis is incorrectly accepted. The rate of a type-2 error is denoted by beta. The power of a test is equal to 1-beta. Powers of 80% or 90% are commonly used in trial design.
A type-1 error is a false positive. The null hypothesis is incorrectly rejected. The probability of rejecting the null hypothesis is the significance level (alpha). This is commonly set to 0.05 or 0.01.
Standard error is the standard deviation of a sampling distribution. This is commonly the mean of a population, referred to as the “standard error of the mean”, and represents how far the sample mean is likely to be from the true population.
In contrast, the standard deviation is the degree of dispersion of the population itself around the mean.
Variance is the average of the squared differences of the mean.
The normal distribution or “bell curve” is symmetrical about the mean. The mean, mode and median are equal. 68% of the population lies within 1 standard deviation of the mean and 95% lies within 2 standard deviations. A standard normal distribution is a specific case where the mean is 0 and there is a standard deviation of 1.
Pearson correlation coefficient describes the interdependence between pairs of continuous variables. The linear association is quantified within the range -1 to 1. A correlation coefficient of 0 denotes no linear association. It does not take into account confounders and effect modifiers.