Stats Flashcards
The p value
the probability of obtaining a result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.
- Type I error
the null hypothesis is rejected when it is true - i.e. Showing a difference between two
groups when it doesn’t exist (= significance level)
(False positive)
Type II error
the null hypothesis is accepted when it is false - i.e. Failing to spot a difference when
one really exists
power of a study
probability of (correctly) rejecting the null hypothesis when it is false
* power = 1 - the probability of a type II error
* power can be ↑ by increasing the sample size
Correlation tests
Parametric (normally distributed): Pearson’s coefficient
* Non-parametric: Spearman’s coefficient
Parametric tests
Student’s t-test - paired or unpaired
* Pearson’s product-moment coefficient - correlation
Non-parametric tests
Mann-Whitney - unpaired data
- Wilcoxon matched-pairs - compares two sets of observations on a single sample
- Chi-squared test - used to compare proportions or percentages
- Spearman, Kendall rank – correlation
- McNemar’s test is used on nominal data to determine whether the row and column marginal frequencies are equal
Funnel Plot
primarily used to demonstrate the existence of publication bias in meta-analyses. Funnel plots are usually drawn with treatment effects on the horizontal axis and study size on the vertical axis.
Central Limit Theorem (CLT)
the random sampling distribution of mean would always tend to be normal irrespective of the population distribution for which the sample were drown.
The mean of the random sampling distribution of means is equal to the mean of the original population
Confidence Interval (CI):
describes the range of value around a mean, an odds ratio, a pvalue or a standard deviation within which the true value lies.
95% CI → 5% chance the true mean value for variable lies outside the range CI = mean ± 2xSE (Standard Error)
Normal Distribution
known as Gaussian distribution or ‘bell-shaped’ distribution. It describes the spread of many biological and clinical measurements
Standard deviation
The standard deviation (SD) represents the average difference each observation in a sample lies
from the sample mean
* SD = square root (variance)
Positively skewed distribution:
mean > median > mode
Negatively skewed distribution
mean < median < mode
The Standard Error of the Mean (SEM
is a measure of the spread expected for the mean of the observations - i.e. how ‘accurate’ the calculated sample mean is from the true population mean
Relative Risk (RR)
ratio of risk in the experimental group (experimental event rate, EER) to risk in the control group (control event rate, CER)
EER / CER
Control event rate
(Number who had particular outcome with the control) / Total number who
had the control)
Experimental event rat
rate at which events occur in the experimental group
(Number who had particular outcome with the intervention) / (Total
number who had the intervention)
Relative risk reduction (RRR)
calculated by dividing the absolute risk reduction by the control event rate
RRR = (CER - EER) / CER
The Hazard Ratio (HR)
similar to relative risk but is used when risk is not constant to
time. It is typically used when analysing survival over time
Numbers Needed to Treat
Numbers needed to treat (NNT) is a measure that indicates how many patients would require an intervention to ↓ the expected number of outcomes by 1. It is rounded to the next highest whole number
NNT = 1 / (CER - EER), or 1 / Absolute Risk Reduction
Odds Ratio
ratio of the odds of a particular outcome with experimental treatment and that of control
Pre-test probability
The proportion of people with the target disorder in the population at risk at a specific time (point prevalence) or time interval (period prevalence)
For example, the prevalence of rheumatoid arthritis in the UK is 1%
Post-test probability
The proportion of patients with that particular test result who have the target disorder Post-test probability = post test odds / (1 + post-test odds)
Pre-test odds
The odds that the patient has the target disorder before the test is carried out Pre-test odds = pre-test probability / (1 - pre-test probability)
The incidence
is the number of new cases per population in a given time period.
prevalence
otal number of cases per population at a particular point in time.
sensitivity
100% means that the test recognizes all sick people as such. Thus in a high sensitivity test, a negative result is used to rule out the disease.
TP / (TP + FN) how many of the sick patients can the test identify by %
positive predictive value
ratio of true positives to combined true and false positives
TP / (TP + FP) how many of the test +ve samples are actually sick
Specificity
TN / (TN + FP) how many of the healthy patients can the test identify by %
Negative predictive value
TN / (TN + FN) how many of the test –ve samples are actually healthy
Screening: Wilson and Junger Criteria
The condition should be an important public health problem
2. There should be an acceptable treatment for patients with recognised disease
3. Facilities for diagnosis and treatment should be available
4. There should be a recognised latent or early symptomatic stage
5. The natural history of the condition, including its development from latent to declared disease
should be adequately understood
6. There should be a suitable test or examination
7. The test or examination should be acceptable to the population
8. There should be agreed policy on whom to treat
9. The cost of case-finding (including diagnosis and subsequent treatment of patients) should be
economically balanced in relation to the possible expenditure as a whole
10. Case-finding should be a continuous process and not a ‘once and for all’ project
Randomised controlled trial
Participants randomly allocated to intervention or control group (e.g. standard treatment or placebo)
* Practical or ethical problems may limit use
- Superiority trail
whilst this may seem the natural aim of a trial one problem is the large sample size needed to show a significant benefit over an existing treatment
Equivalence
an equivalence margin is defined (-delta to +delta) on a specified outcome. If the confidence interval of the difference between the two drugs lies within the equivalence margin then the drugs may be assumed to have a similar effect
Non-inferiority
imilar to equivalence trials but only the lower confidence interval needs to lie within the equivalence margin (i.e. -delta). Small sample sizes are needed for these trials. Once a drug has been shown to be non-inferior large studies may be performed to show superiority