Stats + Public Health Flashcards
Comparing differences between means of SEVERAL independent groups
best method? null hypothesis?
ANOVA - analysis of variance
compares MEANS BTWN GROUPS with the VARIABILITY WITHIN GROUPS (the “F test”)
determines whether any of the means are signif diff
null hypothesis is that all groups are simply random samplings of the same population (ie, their means are the same)
rejected if at least 2 of the groups have signif diff means
Tx vs Control group chosen in present
compare for outcome of interest in FUTURE
Clinical trial
Risk factor group vs. no risk factor group chosen in present
Compare disease INCIDENCE in FUTURE
Prospective cohort
Review past records to find…
Risk factor-positive vs. risk factor-negative groups in PAST…
… comparing disease INCIDENCE in PAST
Retrospective Cohort
Select diseased and non-diseased people in present…
compare past records for risk factor exposure
Case-Control Study
Compare risk factor-positive vs. risk factor-negative grps in PRESENT, looking for disease PREVALENCE
Cross-sectional
takes place entirely in present (ex: look at sodium channel mutants vs. non-mutants and measure their BP)
(can be serial BP measurements over a week period… still considered “present”)
what is standard deviation?
a measure of “degree of dispersion” from the mean
SD is a distance from the mean of a data set in which a FIXED PROPORTION of the observed data points lies
what does a large vs. small standard deviation mean?
large - observations (data points) are spread over a larger range
small - data points are clustered more tightly / vary less
what is the rule for what % of observed data points lie within 1, 2 and 3 standard deviations of the mean?
68 - 95 - 99.7 rule
(or remember 70-95-100)
68% lie within 1 SD
95% within 2 SD
99.7% within 3 SD
Cohort study
measurement used?
relative risk
risk of outcome in expose / risk of outcome in unexposed
RR = 1.0 (null value)
RR > 1 - exposure related to incr. risk of outcome
RR < 1 - exposure related to decr. risk of outcome
2 measures of STATISTICAL SIGNIFICANCE that can strengthen findings of a study using RR (cohort study)
when is the result considered statistically significant by these measures? how are the measures related?
95% confidence interval
p-value
when 95% CI does not contain the null value (RR = 1) it is statistically significant
when p-value is <0.05, result is stat signif
when 95% CI does not contain null value, p-value will be <0.05
what is the relationship btwn 95% CI and p-value?
99% CI and p-value?
95% CI not containing null value = p value <0.05
99% CI not containing null value = p value <0.01
2 broad classes of variables
Qualitative (categorical) - disease status, blood type, etc.
Quantitative - body weight, glucose level, etc.
Test for association btwn TWO CATEGORICAL VARIABLES
both dep./indep. variables are categorical
CHI SQUARE TEST
evaluates assoc. (or lack thereof) btwn 2 categorical variables (eg, statin therapy vs. no statin and low vs. high preprocedural fibrin levels in PCI pts)
(logistic regression could also be used if the DEPENDENT VARIABLE IS DICHOTOMOUS)
What is the test for assoc. btwn an INDEPENDENT QUANTitative and DEPENDENT QUALitative variable?
caveat?
Logistic regression
dependent variable MUST BE DICHOTOMOUS
What is the test for assoc. btwn an INDEPENDENT QUALitative and a DEPENDENT QUANTitative variable?
specifically when there are ONLY TWO GROUP MEANS being compared
the TWO SAMPLE T-TEST
(remember “Tea is for TWO”)
ex: compare mean BP (quant) btwn men + women (qual)
test for assoc. btwn INDEPENDENT QUALitative and a DEPENDENT QUANTitative variable?
when there are greater than 2 group means
ANOVA
analysis of variance
When can LINEAR REGRESSION be used to determine if there is assoc. btwn two variables?
when the DEPENDENT VARIABLE IS QUANTITATIVE
whether the independent is quantitative or qualitative
what is the test for assoc. btwn an INDEPENDENT QUANTitative and a DEPENDENT QUANTitative variable?
what value is given in this test?
Correlation analysis
gives a “correlation coefficient” known as “r” which is between -1 and 1 depending on if the variables are directly or inversely correlated
Linear Regression vs. Correlation Coefficient
what are they + how are they different?
LINEAR REGRESSION - models linear relationship (makes a “trend line”) btwn dependent + independent variable (ie, # of cigarettes smoked per day as it relates to # yearly hospitalizations in COPD pts)
CORRELATION COEFFICIENT - a measure of the strength and direction of a linear relationship btwn 2 variables (eg, assoc. btwn estrogen level + breast cancer risk)
CC is reported as a single number describing the strength and direction (negative or positive) of the correlation; LR is a line-of-best-fit made from individual data points
What is the two sample t-test often used for?
what value can be calculated from the two sample t-test?
what data is needed to do the test?
to see if the means of 2 populations are equal
gives the P-VALUE … if p < 0.05 null hypothesis rejected and means are statistically different
needs the 2 means, the standard deviations of each mean, and the sample sizes
Case-Control Study
parameter calculable from the results?
Odds Ratio
1 cause cancer mortality for both sexes
lung cancer
lung cancer mortality trends (1930s onward)
smoking peaked in the mid-50s but MORTALITY RATES PEAK 20-50 YEARS AFTER SMOKING ONSET
(chart shows large increase in mortality from 70s onward and then decline from 2000 on due to declining smoking rates)