Biostats Flashcards
Cross-sectional study
Collects data from a group of people to assess disease frequency at a particular point in time
May show risk association, but not causality
“What’s happening?”
Measures prevalence
Case-control study
Compares group with disease to a group without disease
Looks for prior exposure/risk
Retrospective
“What happened?”
Measures odds ratio: OR = [(a/c)/(b/d)] = (ad)/(bc)
Cohort study
Compares initially disease-free people in two groups to see who develops disease: one with exposure/risk, and one without exposure/risk
Can show if exposure/risk increases disease likelihood
Retrospective OR prospective
“Who will develop/developed disease?”
Measures relative risk: RR = [a/(a+b)]/[c/(c+d)]
Twin concordance, adoption studies
Measure heritability and environmental influence
Mono- vs dizygotic twins
Siblings with biological vs adoptive parents
Clinical trial phase goals
I: Is it safe?
II: Does it work?
III: Is it as good or better than current treatments?
IV: Can it stay?
Odds ratio
Odds that a group with disease was exposed to a risk divided by the odds that the group without the disease was exposed
OR = (a/c)/(b/d) = (ad)/(bc)
Typically used for case-control studies
Relative risk
Risk of developing disease in the exposed group divided by risk in the unexposed group
RR = [a/(a+b)]/[c/(c+d)]
Typically used in cohort studies
If prevalence is low, OR ~ RR
Attributable risk
Difference in risk between exposed and unexposed groups, i.e. proportion of disease occurrences attributable to an exposure
AR = a/(a+b) - c/(c+d)
Relative risk reduction
Proportion of risk reduction attributable to an intervention as compared to a control
RRR = 1 - RR = 1 - [a/(a+b)]/[c/(c+d)]
Absolute risk reduction
Difference in risk attributable to the intervention as compared to the control
ARR = c/(c+d) - a/(a+b) = -AR
Number needed to treat
NNT = 1/ARR (treat has more letters than harm)
Number needed to harm
NNH = 1/AR
Bias in recruiting participants
Selection, sampling, referral, allocation bias
E.g. Berkson bias - study population is from a hospital and less healthy than the general population
Healthy worker effect - (opposite of Berkson)
Non-response - nonrespondents differ from participants meaningfully
Randomize to reduce
Procedure bias
Subjects in different groups are not treated the same
Includes detection bias: Those with a risk factor undergo greater diagnostic scrutiny than those without the risk
Use blinding and placebos to reduce
Recall bias
Awareness of disorder alters recall by subjects
Common in retrospective studies
Decrease time from exposure to follow-up to reduce
Observer-expectancy bias
Researcher’s belief in a treatment’s efficacy changes outcomes
AKA Pygmalion effect or self-fulfilling prophecy
Use blinding and placebos to reduce
Confounding bias
Factor is related to both exposure and outcome, but not the causal pathway
Reduce with multiple/repeat studies, matching of patients with similar characteristics in both control and treatment groups, crossover studies where subjects act as their own controls
Lead-time bias
Early detection is confused with increased survival
Especially important for studies of long-term chronic disease
Reduce by measuring back-end survival by controlling for disease severity at time of diagnosis
Hawthorne effect
AKA observer effect
Subjects tend to change their behavior when they know they’re being observed
alpha definition
Probability of making a type I error (finding a difference between control and experimental groups when one does not exist)
beta definition
Probability of making a type II error (stating there is no difference between control and experimental groups when one does exist)
beta increases as alpha decreases
Power
1 - b
Increases as beta decreases: Increased precision, increased effect, or INCREASED SAMPLE SIZE
t-test
Checks differences between the MEANS OF 2 GROUPS
E.g. BP between males/females
ANOVA
Checks differences between the MEANS OF AT LEAST 3 GROUPS
E.g. BP between members of 3 ethnic groups
Chi-square test
Checks differences between 2 or more PERCENTAGES OR PROPORTIONS OF CATEGORICAL OUTCOMES
E.g. Percentage of members of 3 ethnic groups with HTN
Ordinal data
Data ordered by a position on a scale
Usually categorical - cannot perform arithmetic with these
E.g. Runners finishing in 1st, 3rd, 5th places
Qualitative - Non-parametric
Interval data
Data measured along a scale in which each position is equidistant
Quantitative - Parametric
Allows for distances between data points to be equivalent in a way
E.g. Happiness scale from 1-10 or Runners finishing a 5k between 18:00-18:59, 19:00-19:59, 20:00-20:59, etc.
Nominal data
Data differentiated by a simple naming system
Usually categorical - E.g. “employee”
May have a number assigned, but is not ordinal (E.g. Runner’s ID number or an athlete’s jersey number)
Qualitative - Non-parametric
Ratio data
Data in which numbers are multiples of each other and can be mathematically compared. Zero has a meaning on the scale used for this data
E.g. Runner’s finishing time for a race
Quantitative - Parametric
Continuous data
Measured along a continuous scale allowing for infinitely fine subdivision
Vs. discrete where data falls into bins like with interval data
Parametric data
Quantitative, forms predictable distributions (e.g. normal)
Can use arithmetic to gain insight into the datasets
Non-parametric data
Qualitative, does not assume any distribution
Likelihood ratio for a positive test
Sensitivity/(1-Specificity)
Likelihood ratio for a negative test
(1-Sensitivity)/Specificity
Sensitivity
Chance a test detects disease when it is present
(True-positive rate)
a/(a+c)
TP/(TP+FN)
Specificity
Chance a test indicates no disease when none is present
(True-negative rate)
d/(b+d)
TN/(TN+FP)
Positive predictive value
Proportion of positive test results that are true positives
a/(a+b)
TP/(TP+FP)
Negative predictive value
Proportion of negative test results that are true negatives
d/(c+d)
TN/(TN+FN)
Incidence
New cases occuring during a particular time period
N(new cases)/N(at risk)
Prevalence
Number of people affected by a disease at a given point in time
N(w/disease)/N(population)
Increases w/ incidence
Decreases w/ death of affecteds and recovery
Standard error of the mean
Used for samples of a population
SEM = s/sqrt(n), where s = stddev of sample
Correlation coefficient
r
Always between -1 and 1
More negative = stronger negative correlation, etc.
Coefficient of determination
r^2
Always between 0 and 1
Represents the amount of variance in the dependent variable (y) due to the independent variable (x):
y = a + bx