Biostats & Epi for PM Flashcards
Fetal Death Rate Equation
total number of fetal deaths in a given time period/total number of live births during the same period of time x 1000
Infant Mortality Rate Equation
total number of deaths of infants (<1 y/o) in a time period/total number of live births during the same period x 1000
Maternal Mortality Rate Equation
deaths due to pregnancy related illness in a given time period/total number of live births during the same period of time x 100,000
Neonatal Mortality Rate Equation
total number of deaths of neonates (<28 days old) in a given time period/total number of live births during the same period of time x 1000
Perinatal Mortality Rate Equation
neonatal deaths + fetal deaths in a given time period/total number live births and fetal deaths during the same time period x 1000
ecological fallacy definition
an association at the population level is not necessarily true at the individual level
studies with ecological fallacy
cross-sectional studies
vital statistics recorded (4)
birth, death, marriage, divorce
length bias definition
when a less aggressive disease appears to have a higher incidence because slower-moving diseases are more likely to be detected
non differential bias is the same as
random error
lead time bias definition
appearance that early diagnosis of a disease prolongs survival
Hawthorne effect definition
individual behavior changes when a person knows they are being observed
regression to the mean definition
the further a value is from the mean, the more likely future recordings are closer to the mean
Neyman bias definition
selective survival bias
cases in a study have different exposures than the ones that die
When does stratification reduce confounding?
analysis stage
3 ways to reduce confounding during the design stage
randomization
restriction
matching
3 ways to reduce confounding during the analysis stage
standardization
stratification
statistical modeling
Bayes theorem equation
(prevalence)(sensitivity)/(prevalence)(sensitivity) + [(1-prev)(1-specificity)]
incidence density definition
number of new cases of a disease per summation of time that each person is at risk of a disease in a specified time and place
incidence density equation
new cases/sum of person-time
central limit theorem definition
when there are a large amount of mutually independent random variables, the mean population will approach normal distribution (n >30)
IQ mean and SD
100 +/- 15
z-score definiton
how many standard deviations are between an observed value and the mean
z-score equation
observed value - mean / standard deviation
rule of addition equation
event 1 + event 2 - (event 1 and event 2 overlap) = probability
used for non-mutually exclusive events
standard mortality ratio equation
observed # of deaths/expected # of deaths x 100
direct adjustment
when you use a second population to extrapolate estimates
null hypothesis definition
there is no difference between the variables being tested
type 1 error definition
when a null hypothesis is rejected when it is actually true (ex. false-positives)
type 2 error definition
when a false null hypothesis is not rejected (ex. false negatives)
confidence interval equation
mean +/- 1.96(std dev/sq root N)
as prevalence increases, PPV _____ and NPV ____
increases, decreases
power equation
1 - beta = 1- the probability of rejecting the null when the null is true
3 ways to increase power
increase sample size
decrease beta
increase threshold of Ho
NNT equation
1/ARR = 1/risk exposed - risk unexposed
NNH Equation
1/absolute risk increase
9 components to determine causality
- consistency of association
- strength of association
- specificity
- temporal factors
- coherence of explanation
- biological plausibility
- experimental evidence from a controlled trial
- dose-response relationship
- analogy
Standard error equation
std dev/sq root n
internal validity definition
how well a study represents the true association within a study
external validity definition
how well the results of a study are generalizable to a different population
degrees of freedom equation
(rows-1)(columns-1)
chi squared equation
sum of (observed data-expected data)sq/expected data
expected= (rows)(columns)/total
Kappa equation
observed agreement/chance agreement/total number-chance agreement
observed: agreed true + agreed false
cell agreement due to chance = (row total)(column total)/(total number)
chance agreement = TT chance + FF chance
F test
part of ANOVA
confounder definition
3rd variable associated with the exposure and the outcome
obscured the relationship between the exposure and outcome
effect modifier definition
changes the relationship between exposures and outcomes
intervening variable defintion
a mechanism by which a causal variable leads to an outcome
necessary cause definition
required for disease to occur but may not invariable lead to disease
sufficient cause definition
invariably leads to a disease
coefficient of determination definition
the proportion of variation of a dependent variable that can be explained by an independent variable
3 examples of time-series analysis
cohort studies
epidemic studies
longitudinal data
McNemar’s Test definition
chi-sq test for non-independent variables, allows you to analyze matched pairs or calculate before and after in the same variable
Mann-Whitney U test definition
tests the median between two groups, the nonparametric version fo the t-test
attributable risk equation
a/a+b - c/c+d
relative risk equation
(a/a+b)/(c/c+d)
OR equation
(a/c)/(b/d)
25th percentile calculation
(n+1)/4
sign test defintiion
nonparametric test that compared dichotomous differences in data from matched otherwise identical pairs, ignored magnitude of difference
Nonparametric version of t-test
mann-whitney U test
wilcoxon rank-sum test
Nonparametric version of paired t-test
Wilcoxon signed rank test
sign test
Nonparametric version of ANOVA
Kruskal-wallis test
Nonparametric version of Pearson correlation
spearman correlation
chi-sq
regular categorical variable example
group names, M/F
ordinal variable definition
group names with an order, ex. cancer stage
continuous variable definition
measurements, ex. height/weight
discrete numeric variable example
counts, ex. number of crashes at an intersection
interval variable definition
continuous variable with no true zero
ratio variable definition
continuous variable with a true 0
variance equation
average squared distance from the mean
standard deviation equation
square root of variance
right skew effect on measures of central tendency
mean > median
tail goes to the right
left skew effect on measured of central tendency
mean < median
tail goes to the left
geometric mean for skewed data equation
mean of logs = e^mean
coefficient of variation equation
ratio of std dev to the mean x 100
SD/mean x 100
2 uses of coefficient of variation
- compare relative data spread for 2 variables
2. evaluate precision of the measurement of a single variable
z score definition
number of standard deviations a value is away from the mean
percentile of z=0
50th percentile
percentile of z=1
84th percentile
percentile of z=2
97.5th percentile
z score equation
z = obs value - known sample mean / population std dev
4 types of random samples
simple random sample
stratified random sample
cluster random sample
systematic random sample
central limit theorem
distribution of sample means is approximately normal if the sample size is large enough (N~=30)
standard deviation of distribution of the sample mean equation
AKA standard error
std dev/sq root N
95% CI equation
sample mean +/- 2(pop sd/sq root sample size)
two sided null and alternative hypotheses
H0: mu1 = mu0
HA: mu 1 does not = Mu0
one sided null and alternative hypotheses
H0: mu1 >= M0 HA mu1 < Mu0 OR H0: Mu <= M0 HA Mu1>M0
3 steps to hypothesis testing
- calculate test statistic
- identify probability distribution of the test statistic
- calculate p-value from test statistic based on probability distribution
How do you reduce type 1 error?
select a smaller alpha
decrease alpha, sample size ___ and Power ___
increases, decreases
to detect smaller differences between samples, sample size should be _____ and power should ____
increased, decrease
4 tests available for continuous outcome/categorical predictor with 2 groups`
t-test
Wilcoxon rand sum (NP)
mann-whitney U test
median test
2 tests available for continuous outcome/categorical predictor with >2 groups
ANOVA
kruskal-wallis (NP)
3 test available for paired continuous outcome/categorical predictor
paired t-test
Wilcoxson signed rank (NP)
sign test
3 tests available for categorical outcome/categorical predictor
chi squared
fisher’s exact test
paired–McNemar’s chi squared
3 tests available for continuous outcome/continuous predictor
Pearson’s
spearman’s (NP)
linear regression
Test for categorical outcome/continuous predictor
logistic regression
When should you use nonparametric tests? (3)
to convert values to rank–then analyze rank
with small sample sizes
with ordinal outcomes
2 sample t-test use and output
use: compare continuous outcome between 2 groups when the data is symmetric or n>15
outcome: t-statistic –> p-value
Wilcoxon rank-sum test use and output
use: compare continuous outcome between 2 groups when the data is skewed, small n, or ordinal data
output: rank overall –> compare sums of ranks between 2 groups
Median test definition
overall median across entire sample
asks whether each value is > or < median and compares via a 2x2 table and chi-squared
Paired t-test use
compare continuous outcomes in pairs
looks at mean difference of pairs then asks is it different y/n by one-sample t test
Wilcoxon signed rank use
continuous outcomes in pairs when there are few pairs or data is skewed
Sign test use
continuous outcomes in pairs when you don’t have numbers, only relationships
ANOVA use
comparing continuous outcomes between >2 groups
Kruskall Wallis Use
comparing continuous outcomes between >2 groups when you have skewed sample, small n, ordinal data
compares sums of ranks or groups
Fisher’s exact test use
small sample size for categorical outcome/categorical predictor (any cell <5)
McNemar’s chi-squared use
chi-sq for matched or paired proportions (ex. matched case-control)
r^2 definition
the amount of variability accounted for by the line of best fit
correlation coefficient equation
sq root of r^2
r=0.2 is ____ correlation
weak
r=0.4 is _____ correlation
moderate
r=0.8 is _____ correlation
strong
Linear regression use
continuous outcome w continuous predictor
F test equation
MSfitted/MSerror with p-1, n-1 DFs
Multicolinearity definition
When 2 or more predictor variables are highly correlated
Multicolinearity consequences (2)
increases standard error of beta estimates
can lead to confusion/misleading results
ANCOVA use
used to compare means between groups while controlling for other variables (covariates) that may be unbalanced between groups
logistic regression use
categorical outcome/continuous predictor
betas are estimated from maximum likelihood–model gives the probability of the outcome
Who determines which diseases are notifiable?
Council of Territorial and State Epidemiologists
Sensitivity definition
the proportion of those that have a diseases that are accurately defined as having it (SNOUT)
Specificity definition
those without a disease that are accurately identified as NOT having it (SPIN)
Multiplication rule equation
P(event 1 and event 2) = P(1) x P(2)
Multiplication rule use
determine the probability of 2 independent events
can also use to test for independence
Addition rule equation (mutually exclusive)
P(1 or 2) = P(1) + P(2)
Addition rule equation (not mutually exclusive)
P(1 or 2) = P(1) + P(2) - P(1 and 2)
I^2 Statistic definition
total variation in a study estimate due to heterogeneity between studies (for meta-analysis)
If >50% –> heterogenous
Kaplan-Meier curve statistical test
log rank test
Cox proportional hazards test
hazard ratios
Common source outbreak pattern
a group of people become ill after being exposed to a point-source contaminant
Continuous common source outbreak pattern
a common source continuously affects this who come into contact with them
Propagated outbreak pattern
infection is transmitted from one person to another
Mixed outbreak pattern
when a common source outbreak is complicated by person-to-person spread
Meta-analysis output for categorical variables
OR
Meta-analysis output for continuous variables
mean differences
sensitivity + ______ = 1
false negative error rate
specificity + _____ = 1
false positive error rate
ILINet Case Definition (3)
fever >100
cough +/- sore throat
if flu swab + ok
How does NHANES get its data?
home interviews and PEs