Biostatistics Flashcards
distribution terms
mean
median
mode
skew
mean
average value of a dataset
calculated by summing all values and dividing by the number of values
mean limitations
misleading in skewed distributions or distributions with outliers
median
middle value when a dataset is ordered from lowest to highest
when is median ideal
skewed distributions as it is not influenced by outliers
mode
the value that occurs most frequently in a dataset
ideal for skewed distributions as it is not influenced by outliers
skew
describes asymmetry in a distribution
positive skew
the right tail (higher values) is longer
many low values and a few extremely high values
mean > median > mode
negative skew
left tail (lower values) is longer
many high values and a few extremely low values
mean > median > mode
incidence
number of new cases of a condition in a given period
useful for assessing risk and evaluating interventions aimed at preventing disease
prevalence
total disease cases (new + pre-existing) in a population at one point in time divided by a total population
useful for planning health resource allocation and understanding disease burden
not impacted by disease duration or survival rates
point prevalence
percentage of people with the condition at one specific point in time
better reflects the burden of chronic conditions
lifetime prevalence
percent of individuals that ever had the condition at some point in their life
higher than point prevalence for chronic conditions
sensitive to survivorship and disease duration
key differences incidence vs prevalence
incidence assesses new case development over time
prevalence assesses existing disease cases at one time point
incidence excludes pre-existing cases, prevalence includes them
incidence assesses risk, while prevalence assesses burden
sensitivity vs specificity image
sensitivity
proportion of people with the disease who test positive on the assessment
conceptualized as the true positive rate
sensitivity formula
sensitivity = true positives / (true positives + false negatives)
high sensitivity
correctly identifies a high proportion of people who actually have the disease (few false negatives)
sensitivity example
Lyme disease screening test with 95% sensitivity would correctly identify 95% of people with Lyme disease
specificity
defined as the proportion of people without the disease who test negative on the assessment
also conceptualized as the true negative rate
specificity formula
specificity = true negatives / (true negatives + false positives)
high specificity
correctly rules out most people who do not have the disease (few false positives)
specificity example
a cognitive screening test for dementia with 98% specificity would generate few false positive results, correctly identifying 98% of patients without dementia as testing negative
positive predictive value (PPV)
defined as the probability that a person with a positive test result truly has the underlying disease
positive predictive value depends on
sensitivity, specificity, and disease prevalence
formula for positive predictive value
PPV = true positives/(true positives + false positives)
high positive predictive value
high probability of reflecting the true presence of disease
positive predictive value example
if a suicide risk screening test has a PPV of 90%, then 90% of patients screening positive are truly at high risk for suicide
negative predictive value
probability that a person with a negative test result truly does NOT have the underlying disease
negative predictive value depends on
sensitivity, specificity, and disease prevalence
negative predictive value formula
NPV = true negatives / (true negatives + false negatives)
high negative predictive value
a negative result reliably rules out the presence of disease
negative predictive value example
if a screening test for CJD has an NPV of 97%, only 3% of patients screening negative actually have CJD (low false negative rate)
case report/series
detailed description of a single clinical case or small group of cases
mainly descriptive with no comparisons to a control group
used to illustrate unique cases without evidence of causality
hypothesizes about ideas that can be investigated further with better quality research
case report/series example
a report of an individual patient diagnosed with Wilson’s disease that describes their symptoms, diagnosis, and treatment response
case-control study
compares cases (with an outcome) to controls (without outcome) to identify factors associated with the outcome
case-control study design
retrospective design: starts with the outcome and then investigates exposures
case-control study design useful for
studying rare diseases or outcomes with long latency periods
case-control study primary statistics
odds ratios quantifying the level of association
case-control study example
a study comparing the prevalence of chemical exposure at Camp Lejeune between patients diagnosed with Parkinson’s disease and healthy controls without the diagnosis
cross-sectional study
analyzes the relationship between exposures and outcomes at a single point in time
cross-sectional study useful for
disease prevalence and studying multiple outcomes
cross-sectional study cannot determine
temporal sequence between exposure and outcome
cross-sectional study primary statistics
prevalence ratios/odds ratios
cross-sectional study example
a study surveying the prevalence of essential tremor in octogenarians at a single point in time
cohort study
follows population prospectively to quantify outcome risk
groups are defined by exposure status
cohort study establishes
temporal relationship between predictors and outcomes
cohort study compared to cross-sectional study
more expensive and time-intensive
cohort study primary statistics
risk ratios quantifying relative risk
cohort study example
a multi-year study following a group of children into adulthood to track rates of diagnosis of multiple sclerosis and to identify predictive factors
randomized control study
gold standard experimental study in which participants are randomly allocated to study groups
highest internal validity due to randomization minimizing bias
randomized control study establishes
causality between intervention and outcome
randomized control study primary statistics
risk ratios comparing outcomes between groups
randomized control study example
a trial randomly assigning patients with amyotrophic lateral sclerosis to receive either a new medication or placebo, to compare treatment efficacy
case report/series advantages
describe unique cases in detail
case report/series disadvantages
no control group, limited generalizability, susceptible to bias
case report/series statistics used
descriptive only
case-control advantages
good for rare outcomes, retrospective
case control disadvantages
prone to bias (recall, selection), doesn’t determine individual risk
case control statistics used
odds ratio
cross-sectional advantages
easy, provides snapshot of prevalence
cross-sectional disadvantages
no temporal relationship between exposure and outcome
cross-sectional statistics used
prevalence, chi-square
cohort advantages
can determine individual risk and incidence
cohort disadvantages
expensive, time-consuming, loss to follow-up
cohort statistics used
risk ratios
randomized controlled trial advantages
gold standard, minimizes bias
randomized controlled trial disadvantages
very expensive, time intensive, may not reflect real world effectiveness
randomized controlled trial statistics used
risk ratios, NNT, NNH
efficacy trials
measure whether interventions produce the intended result under ideal/controlled conditions
tight inclusion criteria and close monitoring
maximize internal validity
effectiveness trials
examine whether intervention works under real-world conditions
broader inclusion, more variability in delivery/adherence
prioritize generalizability and applicability
crossover trials
participants receive a sequence of different treatments
useful when:
- disease course is stable
- treatment effects short-term or reversible
- minimizes sample size required
must account for treatment carryover effects
naturalistic studies
investigate interventions under routine clinical practice conditions
broad inclusion criteria, less frequent monitoring
findings complement efficacy data on effectiveness
twin studies
compare trait frequency between identical vs fraternal twins
estimate genetic components of disease by parsing genetic versus environmental effects
meta-analysis
statistically synthesizes data from multiple smaller studies to gain Power
can assess consistency or heterogeneity across studies
at risk for selection or publication bias
meta-analysis example
a statistical analysis combining data from multiple studies examining the efficacy of antiplatelets for stroke prevention to determine the overall treatment effect size across trials
association studies
correlate genetic variants and other biomarkers to disease states
pragmatic trials
emphasize accountability by testing interventions in typical “real world” practice settings with more heterogeneous patients and conditions
odds ratio
quantifies the association between an exposure and an outcome, comparing the odds of the outcome occurring in the exposed group to the odds of the outcome occurring in the unexposed group
odds ratio formula
OR = (A/B) / (C/D)
A = cases exposed
B = controls exposed
C = cases unexposed
D = controls unexposed
odds ratio interpretation
OR > 1 means exposure increases odds
OR < 1 means exposure decreases odds
OR = 1 means no association between exposure and outcome
odds ratio use
in case-control studies as a proxy for relative risk
does not provide information about actual risk or incidence and dose not imply causation
relative risk
compares the risk of an outcome among an exposed group to the risk of an unexposed group
provides information about the actual likelihood of the outcome occurring
relative risk formula
RR = incidence in exposed / incidence in unexposed
incidence = number with disease / number without disease
relative risk interpretation
RR > 1: increased risk in the exposed group
RR < 1: decreased risk in the exposed group
RR = 1: equal risk in both groups
relative risk use
directly approximates incidence risk
used frequently in cohort studies
absolute risk reduction (ARR)
the difference in outcome rates between control and experimental groups
absolute risk reduction (ARR) formula
ARR = control event rate - experimental event rate
absolute risk increase (ARI)
the increase in event rates in the experimental group compared to control
absolute risk increase (ARI) formula
ARI = experimental event rate - control event rate
absolute risk reduction/increase info
provides a direct measure of the benefit or harm
relative risk reduction (RRR)/increase (RRI)
translates the absolute risk reduction or increase into a percentage value
makes interpretation of efficacy easier clinically
relative risk reduction (RRR)/increase (RRI) formula
RRR = |ARR|x100 / control event rate
RRI = |ARI|x100 / experimental event rate
number needed to treat (NNT)/harm (NNH)
number of people needed to treat in order for one additional patient to benefit/experience harm
number needed to treat (NNT)/harm (NNH) formula
NNT = 1/ARR
NNH = 1/ARI
hazard ratio
compares the hazard (rate of an event) between groups over time
hazard ratio formula
HR = treatment hazard rate / control hazard rate
hazard ratio interpretation
HR > 1: increased rate of outcome with exposure
HR < 1: decreased rate of outcome with exposure
hazard ratio use
used in survival analysis
attributable risks
used to determine how much disease burden in a population can be attributed to a risk factor
attributable risk percent/proportion
proportion of disease in the exposed group attributable to the exposure
population attributable risk percent
proportion of disease in the whole population attributable to the exposure
hypothesis testing
in research, statistical analysis evaluate hypotheses about treatment effects. This involves starting a null and alternative hypothesis
null hypothesis (Ho)
asserts there is no true difference between groups or no effect of treatment
essentially the “status quo” scenario
default position unless evidence indicates otherwise
null hypothesis (Ho) form
“there is no difference between treatment A and B”
alternative hypothesis (H1)
what investigator hopes to prove with study data
asserts there is true difference or treatment effect
contradicts the null hypothesis
alternative hypothesis (H1) form
“there is a difference between treatment A and B”
p-value
probability of obtaining results >/= the observed effect if the null hypothesis is true
low p-value
reject the null hypothesis
typical threshold for p-value
P </= 0.05
p-value limitation
a statistically significant result does not necessarily imply clinical importance
even large sample studies with tiny differences that are statistically significant may lack clinical significance or practical importance
p-value schematic
confidence intervals
range of values expected to contain the true parameter
help assess clinical significance beyond statistical hypotheses
confidence interval influenced by the size and variability of the sample
wider intervals -> less precision, less confidence in observed effect size
narrower intervals -> greater confidence in point estimate
95% CI
95% probability of containing the true value
type 1 error
incorrectly concluding a difference/effect is real, when it is not
type 1 error equivalent to
false positive result/false rejection of null hypothesis when it was true
type 1 error probability
probability determined by alpha level (typically 0.05 or 5%)
type 2 error
failing to detect a true effect or difference -> false negative finding
concluding there is no effect when one actually exists
type 2 error determined by
the power of the study, which depends on sample size
type 3 error
asking the wrong research question entirely
no meaningful answer, regardless of the statistical findings
may represent a flawed study design or mismatched hypotheses
regression analysis
models the relationship between multiple variables and a dependent variable
regression analysis determines
determines how strongly/weakly one variable predicts or influences another
regression analysis quantifies
quantifies the effect size for each predictor
regression analysis example
an analysis that would test which factors are significantly associated with a higher or lower likelihood of developing Guillain-Barre syndrome, after controlling for other variables
chi-square test
compares observed and expected frequencies between categorical variables
chi-square test determines
determines the likelihood of differences from chance alone
T-test
compares means between two groups
can be paired or independent samples
T-test determines
determines statistical probability that group differences are significant
ANOVA
compares means across more than two groups
ANOVA determines
determines the likelihood that all group means are equal
two-way ANOVA
determines the effects of two independent categorical variables and any interaction between those variables