Statistics/Epidemiology Flashcards
Statistical inference
- process of inferring features of the population from observation of a sample
Biases
selection bias: study groups differ with respect to determinants of outcome other than those studied
- best overcome with randomization
measurement bias: methods of measurement consistently different between groups
- ie. recall bias
confounding bias: two variables travel together and the effect of one is confused by the other
Standard error of the mean (SEM)
definition: measure of distribution of mean of samples around the population mean
ie. determines how accurate a sample of the population this is
Formula: SE= SD of sample/ square root of sample size
Confidence Interval
definition: interval which the true statistic is believed to be found within a population
ie 99% CI suggests 99% confident that the interval contains the population mean
formula: sample mean +/- 2.56 xSE= 99% CI
Z scores
definition: examines the comparison between a sample mean and a known population mean by calculation the difference between means to the SE
formula: Z= (sample mean- pop. mean)/SEM
Null hypothesis
H0: states there is no difference between the samples or populations being compared
ie. P1-P2= 0 or P1=P2
Statistical significance
purpose: how strong the evidence for a difference between 2 groups is and whether it could be obtained by chance alone
significance level= alpha
- normal level are 5%, 1% and 0.1%
- the smaller the value the less likely the difference is due to chance
P values
- the probability that a given difference is observed in the study sample when there is no difference in the population
- strength of the evidence in terms of probabilities
- p 0.05 (5%), p 0.01 (1%), 0.001 (0.1%)
- normally significant if <0.05
Type I and type II errors
type I (alpha) error: false positive
- the probability of detecting a difference when there is none
- usually set at 0.05
type II (beta) error: false negative
- the probability of not detecting a difference when one exists
- usually set at 0.02
power: depends on sig. level, size of difference, sample size
- power= (1- beta)
- the larger the power the smaller the type II error
Students t test
use: to compare the means between to small samples
t value= observed difference in means/SE of the difference in means
paired data t-tests: used to compare two small paired observations
degrees of freedom: no. of independently varying quantities that can be assigned to a distribution
Chi square
use: to determine non parametric differences in mean between two or more groups based on the Chi distribution
Chi2= Sum (observed-expected)2/expected
Correlation
correlation coefficient (r): describes the strength of the linear relationship between variables
- can range from -1 to +1
degree of association
- 0.8-1.0 strong
- 0.5-0.8 moderate
- 0.2-0.5 weak
- 0-0.2 negligible
Regression
definition: relationship between 2 variables and how one value varies depending on the other
formula: Y = a +bx
values: -infinity to +infinity
- slope of 0 represents no relationship
Rates
incidence= no. of new cases in a given period/population at risk during this period
prevalence= total no. of cases in a population at one time/total population at risk at the time
mortality rate= no. of deaths in 1 yr/total population mid-year x 1000
proportionate mortality rate= no. deaths due to cause in period of time/total no. of deaths in same time x 100
standardised mortality ratio= no. deaths in pop./expected deaths in population
- if >100 then more events are occuring than expected
Meta analysis
definition: analysis of data on two of more similar studies to determine global conclusion
- results expressed as odds ratio or relative risk
Measure of effect
absolute risk: occurence in exposed
relative risk: incident rate of exposed/incidence rate of non-exposed
- measures strength of association between exposure and outcome
attributable risk: incidence exposed- incidence non-exposed
absolute risk reduction (ARR): incidence rate in control- incidence in exposed
relative risk reduction (RRR): (1-RR) x100%
- ie. percentage of the baseline risk increased by exposure
number needed to treat (NNT): 1/RRR
- number needed to treat to prevent one event
odds ratio (OR): prob of an event/ (1- prob of an event)
- used for case-control study
hazard ratio (HR): measure of RR in survival studies
- HR>1 suggests one group is more likely to experience event
Strength studies for cause and effect
From strongest to weakest:
- clinical trial
- cohort study
- case-control study
- cross sectional
- case studies
- case report
Validity
sensitivity= TP/ (TP+FN) x100
- ability to correctly detect people with disease
- SnOUT: high sensitivity a negative test rules out the diagnosis
specificity= TN/ (TN+FP) x100
- ability to correctly detect people without disease
- SpIN: high specificity a positive result rules in the diagnosis
predictive value= (TP/TP+FP) x 100
-ability to detect those with disease amongst those whose test is positive
Normal distribution
Normal distribution
- Mode=median=mean
Data within each SD
1 SD: 68%
2 SD: 95%
3 SD: 99%

Study design
cross-sectional study
- freq of a disease or RF in a population at a given time
- can not determine causation
cohort study
- observation study of a group for development of disease
- good for common diseases
- can be prospective/restrospective
case-control study
- comparison of cases with controls to determine difference in groups
- good for rare diseases

Clinical trials
stage 1: pharmacology/toxicity
stage 2: treatment efficacy
stage 3: compare with gold standard
stage 4: Post-marketing surveillance

Formulas
PPV: a/(a+b)
NPV: d/(c+d)
NNT: 1( (a/a+b)-(c/c+d))

Bioavailability
