Stats + Public Health Flashcards
Comparing differences between means of SEVERAL independent groups
best method? null hypothesis?
ANOVA - analysis of variance
compares MEANS BTWN GROUPS with the VARIABILITY WITHIN GROUPS (the “F test”)
determines whether any of the means are signif diff
null hypothesis is that all groups are simply random samplings of the same population (ie, their means are the same)
rejected if at least 2 of the groups have signif diff means
Tx vs Control group chosen in present
compare for outcome of interest in FUTURE
Clinical trial
Risk factor group vs. no risk factor group chosen in present
Compare disease INCIDENCE in FUTURE
Prospective cohort
Review past records to find…
Risk factor-positive vs. risk factor-negative groups in PAST…
… comparing disease INCIDENCE in PAST
Retrospective Cohort
Select diseased and non-diseased people in present…
compare past records for risk factor exposure
Case-Control Study
Compare risk factor-positive vs. risk factor-negative grps in PRESENT, looking for disease PREVALENCE
Cross-sectional
takes place entirely in present (ex: look at sodium channel mutants vs. non-mutants and measure their BP)
(can be serial BP measurements over a week period… still considered “present”)
what is standard deviation?
a measure of “degree of dispersion” from the mean
SD is a distance from the mean of a data set in which a FIXED PROPORTION of the observed data points lies
what does a large vs. small standard deviation mean?
large - observations (data points) are spread over a larger range
small - data points are clustered more tightly / vary less
what is the rule for what % of observed data points lie within 1, 2 and 3 standard deviations of the mean?
68 - 95 - 99.7 rule
(or remember 70-95-100)
68% lie within 1 SD
95% within 2 SD
99.7% within 3 SD
Cohort study
measurement used?
relative risk
risk of outcome in expose / risk of outcome in unexposed
RR = 1.0 (null value)
RR > 1 - exposure related to incr. risk of outcome
RR < 1 - exposure related to decr. risk of outcome
2 measures of STATISTICAL SIGNIFICANCE that can strengthen findings of a study using RR (cohort study)
when is the result considered statistically significant by these measures? how are the measures related?
95% confidence interval
p-value
when 95% CI does not contain the null value (RR = 1) it is statistically significant
when p-value is <0.05, result is stat signif
when 95% CI does not contain null value, p-value will be <0.05
what is the relationship btwn 95% CI and p-value?
99% CI and p-value?
95% CI not containing null value = p value <0.05
99% CI not containing null value = p value <0.01
2 broad classes of variables
Qualitative (categorical) - disease status, blood type, etc.
Quantitative - body weight, glucose level, etc.
Test for association btwn TWO CATEGORICAL VARIABLES
both dep./indep. variables are categorical
CHI SQUARE TEST
evaluates assoc. (or lack thereof) btwn 2 categorical variables (eg, statin therapy vs. no statin and low vs. high preprocedural fibrin levels in PCI pts)
(logistic regression could also be used if the DEPENDENT VARIABLE IS DICHOTOMOUS)
What is the test for assoc. btwn an INDEPENDENT QUANTitative and DEPENDENT QUALitative variable?
caveat?
Logistic regression
dependent variable MUST BE DICHOTOMOUS
What is the test for assoc. btwn an INDEPENDENT QUALitative and a DEPENDENT QUANTitative variable?
specifically when there are ONLY TWO GROUP MEANS being compared
the TWO SAMPLE T-TEST
(remember “Tea is for TWO”)
ex: compare mean BP (quant) btwn men + women (qual)
test for assoc. btwn INDEPENDENT QUALitative and a DEPENDENT QUANTitative variable?
when there are greater than 2 group means
ANOVA
analysis of variance
When can LINEAR REGRESSION be used to determine if there is assoc. btwn two variables?
when the DEPENDENT VARIABLE IS QUANTITATIVE
whether the independent is quantitative or qualitative
what is the test for assoc. btwn an INDEPENDENT QUANTitative and a DEPENDENT QUANTitative variable?
what value is given in this test?
Correlation analysis
gives a “correlation coefficient” known as “r” which is between -1 and 1 depending on if the variables are directly or inversely correlated
Linear Regression vs. Correlation Coefficient
what are they + how are they different?
LINEAR REGRESSION - models linear relationship (makes a “trend line”) btwn dependent + independent variable (ie, # of cigarettes smoked per day as it relates to # yearly hospitalizations in COPD pts)
CORRELATION COEFFICIENT - a measure of the strength and direction of a linear relationship btwn 2 variables (eg, assoc. btwn estrogen level + breast cancer risk)
CC is reported as a single number describing the strength and direction (negative or positive) of the correlation; LR is a line-of-best-fit made from individual data points
What is the two sample t-test often used for?
what value can be calculated from the two sample t-test?
what data is needed to do the test?
to see if the means of 2 populations are equal
gives the P-VALUE … if p < 0.05 null hypothesis rejected and means are statistically different
needs the 2 means, the standard deviations of each mean, and the sample sizes
Case-Control Study
parameter calculable from the results?
Odds Ratio
1 cause cancer mortality for both sexes
lung cancer
lung cancer mortality trends (1930s onward)
smoking peaked in the mid-50s but MORTALITY RATES PEAK 20-50 YEARS AFTER SMOKING ONSET
(chart shows large increase in mortality from 70s onward and then decline from 2000 on due to declining smoking rates)
2 cause of cancer death in women in US?
remember lung is #1 for both sexes
breast cancer
breast cancer mortality trends (1930s on)
reason for changes?
fairly steady (high) rates from 1930-1990
decline after 1990 due to adjuvant chemo/radio and screening
colon cancer mortality trends (1930s on)
reason for changes?
peaked in 1950 and declined since then due to…
surgical + chemo advances
screening
ASPIRIN use
menopausal hormone tx in women
pancreas cancer mortality trends (1930s on)
what affects it?
pancreatic cancer mortality is low simply because INCIDENCE is low (most cases are fatal, but few cases arise)
it is affected primarily by SMOKING and has slowly risen since the 1930s
stomach cancer mortality trends (1930s on)
dramatically + consistently declining since 1930s
likely due to better refrigeration, less food preservation (less salt), better sanitation + housing (less H pylori)
What is “accumulation effect” when studying risk factors and risk reducers?
when exposure to a risk factor / reducer requires a SIGNIFICANT DURATION or INTENSITY of exposure, it can affect whether or not exposure has a statistically significant effect on outcomes
ex: people taking antioxidants for <5 years vs. >5 years and their risk of stroke
What is “lead time bias”?
solution?
when a screening or dx test detects a disease earlier than normal, it may appear that the test results in longer survival after diagnosis
if the test does not actually affect earlier treatment of the disease, however, then the seemingly increased survival time is only due to earlier detection
solution: use life-expectancy to asses benefit
What is the “rare disease assumption”?
diseases with a low incidence rate (“rare”) will also have a low prevalence
when prevalence is low (eg <10%) then the ODDS RATIO is APPROXIMATELY SAME AS RELATIVE RISK
(Odds ratio = case-control studies; RR = experimental, cross-sectional and cohort studies)
PPO (preferred provider organization)
monthly premiums? copayments + deductibles? Referral required for specialist? Network size? out of network possible?
- HIGH premiums
- HIGH copays/deductibles
- NO REFERRALS required
- LARGE network
- CAN go outside network
HMO (health maintenance organization)
monthly premiums? copayments + deductibles? Referral required for specialist? Network size? out of network possible?
- LOWEST premiums
- LOWEST copays/deductibles
- referral REQUIRED
- SMALL network
- CAN’T go outside network
(often uses “capitation”, payment of predetermined amt)
POS (point of service)
monthly premiums? copayments + deductibles? Referral required for specialist? Network size? out of network possible?
- MEDIUM premiums
- VARIABLE copay/deductible (in vs. out network)
- referral REQUIRED
- SMALL network
- CAN go outside network
3 patient populations who can be on MEDICARE
> 65
disabled
end-stage kidney disease
“z scores” for 95% and 99% of a normal distribution
- 96 for 95%
- 58 for 99%
this correlates to the 68/95/99 rule where approximately 68% of observations lie within 1 SD of mean, 95% within 2 SD and 99.7% within 3 SD
What is “standard error” and how is it calculated?
SE is the “standard deviation” of MULTIPLE SAMPLE MEANS
it estimates how far the sample mean is likely to be from the actual population mean
calculated as SD/√n
where n is the sample size (the larger the sample size, the smaller the standard error)
How can a confidence interval be calculated?
CI = sample mean +/- [z-score] x [SE]
remember that 1.96 and 2.58 are the z scores for 95% and 99% confidence levels, and SE = SD/√n
What is Berkson bias?
a form of selection bias that occurs when a STUDY POPULATION IS SELECTED FROM A HOSPITAL, and is thus less healthy than the general population
”
What is the “Hawthorne effect” as it relates to measurement bias?
participants changing their behavior when aware that they are being observed
can be accounted for by using placebo groups
“HAWTHORNE when they know you SAW THEM”
What is procedure bias?
example, how to reduce
subjects in different study groups are treated differently
ex: pt in treatment group spends more time in highly specialized hospital unit, while placebo pt does not
reduce by blinding
what is “observer expectancy bias”?
researcher’s belief in efficacy will change outcome (Pygmalion effect)
researcher believing tx will work is more likely to document positive outcomes
what is “design bias”?
parts of the study do not fit together to answer the question of interest
ex: a non-comparable control group
What is a “convenience sample” or “sample of convenience”?
a study group / population used in the test of a QUALITY IMPROVEMENT INITIATIVE
What is CAPITATION in health care financial management?
what kind of provider network is it used in?
an arrangement in which a payor (individual, gov’t or employer) pays a FIXED, PREDETERMINED FEE to cover all medical services required by a patient
used in HMOs
what is “discounted fee-for-service” payment?
when insurance companies pre-negotiate a fee amount they will pay for each service provided by a provider
what is “global payment” in health care financial management?
most common example?
arrangement in which an insurer pays a provider a SINGLE PAYMENT to cover ALL EXPENSES ASSOCIATED with a SINGLE INCIDENT OF CARE
ex: pay 1 fee for an ELECTIVE SURGERY and all pre/post-op care
what is a “patient-centered medical home”?
model of primary care in which pt has a PERSONAL PHYSICIAN who coordinates + sees pt thru all aspects of care (incl. PREVENTIVE and ACUTE/CHRONIC disease mgmt)
fees can be capitated or fee-for-service
What is the eqn for ATTRIBUTABLE RISK FRACTION or AR PERCENT in exposed pts?
ARexp = (risk exposed - risk unexposed)/risk exposed
if using relative risk: ARexp = (RR - 1)/RR
if percent ARPexp = 100 x (RR-1)/RR
what kind of variables is the TWO SAMPLE T-TEST good for?
independent - qualitative/categorical
dependent - quantitative (continuous numerical)
eg, two groups (independent) treated in different wards for high risk for MI, with measurement of their plasma homocysteine (dependent)
what kind of variables should CHI SQUARE test be used for?
example?
when BOTH indep./dep. variables are QUALITATIVE / CATEGORICAL
used to eval whether EXPECTED FREQ of occurrence fits the OBSERVED FREQ (“goodness of fit”)
ex: eval of Mendelian inheritance of red vs. green seed colors uses chi-square to compare observed and expected proportions of each seed type
What is “late look bias”?
example + solution?
individuals with SEVERE disease are less likely to be uncovered in a survey because the DIE FIRST
ex: pts in an AIDS survey only report mild sx because those with severe sx die
solution: STRATIFY the survey by disease severity
Comparing an old ADHD behavioral mod tx with a new one…
ADHD is more common in boys than girls > girls and boys are randomized SEPARATELY into the two tx groups
WHAT KIND OF TREATMENT ALLOCATION is this an example of?
stratification
What is OUTCOME-ADAPTED randomization in randomized controlled trials?
ratio of patients randomly assigned to the experimental treatment arm versus the control treatment arm changes from 1:1 over time to randomly assigning a higher proportion of patients to the arm that is doing better
What is Type I Error?
What is the VALUE associated with this?
FINDING a significant effect or difference WHEN THERE IS NOT ONE (aka false positive error)
α (alpha) is the probability of making a type I error
(Alpha = “Accused an innocent man”)
(“p value” is judged against a preset value, usually 0.05, such that if p < 0.05 there is less than 5% chance of Type I Error)
How does VARIANCE relate to standard deviation
variance = (SD)^2
What other symbol can be used to represent standard deviation?
σ (sigma)
Hook for remembering what Chi square test is for?
Chi is for categorical variables (both dep + indep)
“CHI-tegorical”
In CORRELATION ANALYSIS…
what is the “coefficient of determination”?
COD = r^2 (correlation coefficient squared)
(quantifies the AMOUNT OF VARIANCE in one variable THAT CAN BE EXPLAINED BY VARIANCE of other variable)
(ie, if correlation coefficient is r = 1 then ALL variance in one variable can be explained by the variance in the other because 1^2 = 1 …. but the further from 1/-1 that r is the less the variance is actually determined by the variance in the other variable)
A question says:
“we chose the sample size to have an 80% power of detecting a significant difference…”
What type of error does this tell us about and what is the value of that error type?
Type II error (β)
since it has a STATISTICAL POWER (ability to detect a difference when there is one) of 80%, then it has a 20% chance of NOT DETECTING when there IS A DIFFERENCE
so β = 20%
What is meant by the “SIGNIFICANCE LEVEL” of a statistical study?
significance level is EQUAL TO α (alpha)
ie, the probability of making a TYPE I ERROR - finding a significant difference when there is none
(ie, Accusing an innocent man … remember α goes with Accusing)