Biostats ABC Flashcards
Power analysis explained
4 components
- 3 are known and your solving for one that is not
1) effect size
2) significance level =type 1 error= alpha=probability of finding an effect that is not there = typical 0.05 (5%) (most similar to p-value)
3) power= beta= type 2 error= probability of finding no effect that actually is there = failing to reject the null = typically 0.2 (80% chance of identifing)
d) sample size (n) - what you are solving for
how do you calculate effect size
estimated from literature
clinical significance?
why use case control
rare outcomes
retrospective
why use cohort
start with exposure
can be prospective or retrospective
accurate
free of error or bias
Precise
minimal effects from chance
types of bias
recall
reporting - subjects in one group more likely to report prior events
selection (food diary)
inter/intra observer
confounding variable
when a characteristic or variable is not distributed the same in the study vs the control (chance or bias)
sensitivity
of everyone with the disease this % will test positive
True positive/all with disease
A/A+C
Disease on top
exposure/test on sides
Does not change with prevelance
specificity
of everyone without the disease this % will test negative
true neg/all negative
D/B+D
of all the patient’s without a disease x% had a negative test
does not change with prevelance
PPV
if the test is positive the chance the patient actually has the disease
- increases with prevalence
A/A+B
NPV
the probability that if the test is negative the subject actually does not have the disease
D/D+C
- decreases with prevalence
ROC curve
x axis- rate of false positive (1-specificity(true negative))
y- axis rate of true positive (sensitivity)
two types of experimental studies
randomized/non-randomized
two type of observational studies
analytical vs descriptive
cohort
case control
cross section
cross-sectional study
looks a prevelance and not incidence
temporal relationship can be unclear
stats for cohort study
true incidence rate
attributable risk
relative risk
case control studies
careful of control group and recall bias
can only calculate odds ratio - when outcome is rare it is very similar to rr
consider more stringent inclusions to ensure less confounding (preeclampsia with severe features requiring delivery vs preelcampsia)
major problem with non-randomized experimental studies
selection bias
radomized controlled trials plus and minus
avoid confounding and selection bias
external validity can be a concern -volunteers can be different from the population
ratio
numerator is not included in the denomator
MMR
proportion
numerator is included in the denomator
-prevalance (proportion)
dimensionless
Rate
numerator is included in the denomator and takes into consideration time
- incidence rate
relative risk
Frequency of the outcome in an exposed group / frequency of he outcome in the unexposed
odds ratio
case control- odds of exposure among the cases/ odds of exposure in controls
cohort/cross sectional/RCT
- ratio of the odds in favor of the disease in the exposed vs unexposed. indicate the RR when the prevelance of the outcome is <5-10%
what is confidence interval
precision of study results.
discriptive- ecological correlational studies
look for associations - trend analysis, healthcare planning, hypothesis generation
correlation studies
measure the association between exposure and outcome with the correltation coefficient r
- can not address causation, can not control for confounding
informational bias
incorrect determination of exposure outcome or both
-misclassification
3 types of bias
selection - berkson-
(different management when expsoure is known, Neyman- selection inherently excludes pts, unmasking, nonrespondent )
informational - observation, classification, or measurement- ascertainment bias, recall,
confounding
control for confounding
restriction- decreased external vaildity
matching- difficult recruitment, can not measure the effect of confounder
stratification- post hoc restriction - mantel haenszel - if it differs from the crude effect than confounding is present - multi variate logistic regression
what does the p value measure
chance - type 1 error- false positive
2 risks that an association is not causal when statistical significance is met
bogus- bias
indirect- confounding
or real!
causal criteria
cause is before effect strong associations ( RR >3, OR >4) consistency dose response specificity of association (only one outcome) biologic plausability experimental evidence analogy (similar to other associations)
nested case- control
within a cohort
what type of study is a before after study
most like cohort
stats in cohort studies
RR
hazard ratio (cox proportional hazard-dicotomous results )
survival curves (Kaplan Meier-log rank compares curves)
incidence rate
bias in cohort
exposure status can change( may want to quantify exposure)
loss to follow-up
likely selection bias
Interval / ordinal /nominal
Interval - scale
ordinal- descrite numerals
nominal- yes/no
unpaired t test
interval
normal distrobution
2 groups
independent
paired t test
interval
normal
2 groups
dependent
ANOVA
Interval
normal
> 2 groups
independent
repeated measure ANOVA
interval
normal
>2 groups
dependent
wilcoxon signed rank test/ sign test
ordinal (or non-parametric (non-normal))
2 groups
dependent
mann whitney/ wilcoxon rank sum
ordinal (or non-parametric (non-normal))
2 groups
independent
compares medians
kruskal- wallis test
ordinal (or non-parametric (non-normal))
>2 groups
independent
Friedman two way anova
ordinal (or non-parametric (non-normal))
> 2 groups
related
Chi-square/ fishers exact (small numbers)
nominal / categorical
independent
2 or more groups
with any proportion
chocharn Q
nominal /categorical
dependent
> 2 groups
McNemar Chi Square
nominal /categorical
dependent
2 groups
shapero wilks
tests for normalicy in data
prevelance
how many people have something
incidence
how many people got something that were at risk for it ( didn’t have it before)
will odd ratio over estimate or under estimate RR
over estimate
odd of exposure in cases/odds of exposure in control
A/C odds of those with disease/
B/D odds of those without disease
Odd ratio of 1 - same risk
Odd ratio <1- protective
odd ratio >1- increases risk
relative risk (incidence)
incidence of diease in those exposed A/A+B
divide by
incidence of diease in those not exposed C/ C+D
RR 1- no association
RR 2- double risk
RR .5- half the risk
Number needed to treat
1/attributable relative risk - Number needed to treat
Positive likelihood ratio
sensitivity/1-specificity
True positive/false positive
negative likelihood ratio
1- sensitivity/specificity
false negative /true negative
STOBE guidelines are used for
cross sectional
case control
CONSORT is used for what studies
RCT
PRISMA is used for what studies
metaanalysis
important stat for metaanalysis
measures of consistance
power is
1-beta
population attributable risk
Incidence in exposed- incidence in unexposed
statistical test for association between 2 continous variables
linear regression
statistical test for association between 2 categorical variables
pearson’s correlation coefficient
statistical test for association between 2 non-parametric variables
spearman’s rank correlation
models for analysis continous binary (categorical) count survival time
continous- linear regression
binary (categorical) - logistic regression
count- poisson regression
survival time- cox proportional hazards regression