Stats + Public Health Flashcards

Question 1

Q

Comparing differences between means of SEVERAL independent groups

best method? null hypothesis?

Answer

A

ANOVA - analysis of variance

compares MEANS BTWN GROUPS with the VARIABILITY WITHIN GROUPS (the “F test”)

determines whether any of the means are signif diff

null hypothesis is that all groups are simply random samplings of the same population (ie, their means are the same)

rejected if at least 2 of the groups have signif diff means

Question 2

Q

Tx vs Control group chosen in present

compare for outcome of interest in FUTURE

Answer

A

Clinical trial

Question 3

Q

Risk factor group vs. no risk factor group chosen in present

Compare disease INCIDENCE in FUTURE

Answer

A

Prospective cohort

Question 4

Q

Review past records to find…

Risk factor-positive vs. risk factor-negative groups in PAST…

… comparing disease INCIDENCE in PAST

Answer

A

Retrospective Cohort

Question 5

Q

Select diseased and non-diseased people in present…

compare past records for risk factor exposure

Answer

A

Case-Control Study

Question 6

Q

Compare risk factor-positive vs. risk factor-negative grps in PRESENT, looking for disease PREVALENCE

Answer

A

Cross-sectional

takes place entirely in present (ex: look at sodium channel mutants vs. non-mutants and measure their BP)

(can be serial BP measurements over a week period… still considered “present”)

Question 7

Q

what is standard deviation?

Answer

A

a measure of “degree of dispersion” from the mean

SD is a distance from the mean of a data set in which a FIXED PROPORTION of the observed data points lies

Question 8

Q

what does a large vs. small standard deviation mean?

Answer

A

large - observations (data points) are spread over a larger range

small - data points are clustered more tightly / vary less

Question 9

Q

what is the rule for what % of observed data points lie within 1, 2 and 3 standard deviations of the mean?

Answer

A

68 - 95 - 99.7 rule

(or remember 70-95-100)

68% lie within 1 SD
95% within 2 SD
99.7% within 3 SD

Question 10

Q

Cohort study

measurement used?

Answer

A

relative risk

risk of outcome in expose / risk of outcome in unexposed

RR = 1.0 (null value)
RR > 1 - exposure related to incr. risk of outcome
RR < 1 - exposure related to decr. risk of outcome

Question 11

Q

2 measures of STATISTICAL SIGNIFICANCE that can strengthen findings of a study using RR (cohort study)

when is the result considered statistically significant by these measures? how are the measures related?

Answer

A

95% confidence interval

p-value

when 95% CI does not contain the null value (RR = 1) it is statistically significant

when p-value is <0.05, result is stat signif

when 95% CI does not contain null value, p-value will be <0.05

Question 12

Q

what is the relationship btwn 95% CI and p-value?

99% CI and p-value?

Answer

A

95% CI not containing null value = p value <0.05

99% CI not containing null value = p value <0.01

Question 13

Q

2 broad classes of variables

Answer

A

Qualitative (categorical) - disease status, blood type, etc.

Quantitative - body weight, glucose level, etc.

Question 14

Q

Test for association btwn TWO CATEGORICAL VARIABLES

both dep./indep. variables are categorical

Answer

A

CHI SQUARE TEST

evaluates assoc. (or lack thereof) btwn 2 categorical variables (eg, statin therapy vs. no statin and low vs. high preprocedural fibrin levels in PCI pts)

(logistic regression could also be used if the DEPENDENT VARIABLE IS DICHOTOMOUS)

Question 15

Q

What is the test for assoc. btwn an INDEPENDENT QUANTitative and DEPENDENT QUALitative variable?

caveat?

Answer

A

Logistic regression

dependent variable MUST BE DICHOTOMOUS

Question 16

Q

What is the test for assoc. btwn an INDEPENDENT QUALitative and a DEPENDENT QUANTitative variable?

specifically when there are ONLY TWO GROUP MEANS being compared

Answer

A

the TWO SAMPLE T-TEST

(remember “Tea is for TWO”)

ex: compare mean BP (quant) btwn men + women (qual)

Question 17

Q

test for assoc. btwn INDEPENDENT QUALitative and a DEPENDENT QUANTitative variable?

when there are greater than 2 group means

Answer

A

ANOVA

analysis of variance

Question 18

Q

When can LINEAR REGRESSION be used to determine if there is assoc. btwn two variables?

Answer

A

when the DEPENDENT VARIABLE IS QUANTITATIVE

whether the independent is quantitative or qualitative

Question 19

Q

what is the test for assoc. btwn an INDEPENDENT QUANTitative and a DEPENDENT QUANTitative variable?

what value is given in this test?

Answer

A

Correlation analysis

gives a “correlation coefficient” known as “r” which is between -1 and 1 depending on if the variables are directly or inversely correlated

Question 20

Q

Linear Regression vs. Correlation Coefficient

what are they + how are they different?

Answer

A

LINEAR REGRESSION - models linear relationship (makes a “trend line”) btwn dependent + independent variable (ie, # of cigarettes smoked per day as it relates to # yearly hospitalizations in COPD pts)

CORRELATION COEFFICIENT - a measure of the strength and direction of a linear relationship btwn 2 variables (eg, assoc. btwn estrogen level + breast cancer risk)

CC is reported as a single number describing the strength and direction (negative or positive) of the correlation; LR is a line-of-best-fit made from individual data points

Question 21

Q

What is the two sample t-test often used for?

what value can be calculated from the two sample t-test?

what data is needed to do the test?

Answer

A

to see if the means of 2 populations are equal

gives the P-VALUE … if p < 0.05 null hypothesis rejected and means are statistically different

needs the 2 means, the standard deviations of each mean, and the sample sizes

Question 22

Q

Case-Control Study

parameter calculable from the results?

Answer

A

Odds Ratio

Question 23

Q

1 cause cancer mortality for both sexes

Answer

A

lung cancer

Question 24

Q

lung cancer mortality trends (1930s onward)

Answer

A

smoking peaked in the mid-50s but MORTALITY RATES PEAK 20-50 YEARS AFTER SMOKING ONSET

(chart shows large increase in mortality from 70s onward and then decline from 2000 on due to declining smoking rates)

Question 25

Q

2 cause of cancer death in women in US?

remember lung is #1 for both sexes

Answer

A

breast cancer

Question 26

Q

breast cancer mortality trends (1930s on)

reason for changes?

Answer

A

fairly steady (high) rates from 1930-1990

decline after 1990 due to adjuvant chemo/radio and screening

Question 27

Q

colon cancer mortality trends (1930s on)

reason for changes?

Answer

A

peaked in 1950 and declined since then due to…

surgical + chemo advances
screening
ASPIRIN use
menopausal hormone tx in women

Question 28

Q

pancreas cancer mortality trends (1930s on)

what affects it?

Answer

A

pancreatic cancer mortality is low simply because INCIDENCE is low (most cases are fatal, but few cases arise)

it is affected primarily by SMOKING and has slowly risen since the 1930s

Question 29

Q

stomach cancer mortality trends (1930s on)

Answer

A

dramatically + consistently declining since 1930s

likely due to better refrigeration, less food preservation (less salt), better sanitation + housing (less H pylori)

Question 30

Q

What is “accumulation effect” when studying risk factors and risk reducers?

Answer

A

when exposure to a risk factor / reducer requires a SIGNIFICANT DURATION or INTENSITY of exposure, it can affect whether or not exposure has a statistically significant effect on outcomes

ex: people taking antioxidants for <5 years vs. >5 years and their risk of stroke

Question 31

Q

What is “lead time bias”?

solution?

Answer

A

when a screening or dx test detects a disease earlier than normal, it may appear that the test results in longer survival after diagnosis

if the test does not actually affect earlier treatment of the disease, however, then the seemingly increased survival time is only due to earlier detection

solution: use life-expectancy to asses benefit

Question 32

Q

What is the “rare disease assumption”?

Answer

A

diseases with a low incidence rate (“rare”) will also have a low prevalence

when prevalence is low (eg <10%) then the ODDS RATIO is APPROXIMATELY SAME AS RELATIVE RISK

(Odds ratio = case-control studies; RR = experimental, cross-sectional and cohort studies)

Question 33

Q

PPO (preferred provider organization)

monthly premiums?
copayments + deductibles?
Referral required for specialist?
Network size?
out of network possible?

Answer

A

HIGH premiums
HIGH copays/deductibles
NO REFERRALS required
LARGE network
CAN go outside network

Question 34

Q

HMO (health maintenance organization)

monthly premiums?
copayments + deductibles?
Referral required for specialist?
Network size?
out of network possible?

Answer

A

LOWEST premiums
LOWEST copays/deductibles
referral REQUIRED
SMALL network
CAN’T go outside network

(often uses “capitation”, payment of predetermined amt)

Question 35

Q

POS (point of service)

monthly premiums?
copayments + deductibles?
Referral required for specialist?
Network size?
out of network possible?

Answer

A

MEDIUM premiums
VARIABLE copay/deductible (in vs. out network)
referral REQUIRED
SMALL network
CAN go outside network

Question 36

Q

3 patient populations who can be on MEDICARE

Answer

A

> 65

disabled

end-stage kidney disease

Question 37

Q

“z scores” for 95% and 99% of a normal distribution

Answer

A

96 for 95%
58 for 99%

this correlates to the 68/95/99 rule where approximately 68% of observations lie within 1 SD of mean, 95% within 2 SD and 99.7% within 3 SD

Question 38

Q

What is “standard error” and how is it calculated?

Answer

A

SE is the “standard deviation” of MULTIPLE SAMPLE MEANS

it estimates how far the sample mean is likely to be from the actual population mean

calculated as SD/√n

where n is the sample size (the larger the sample size, the smaller the standard error)

Question 39

Q

How can a confidence interval be calculated?

Answer

A

CI = sample mean +/- [z-score] x [SE]

remember that 1.96 and 2.58 are the z scores for 95% and 99% confidence levels, and SE = SD/√n

Question 40

Q

What is Berkson bias?

Answer

A

a form of selection bias that occurs when a STUDY POPULATION IS SELECTED FROM A HOSPITAL, and is thus less healthy than the general population

”

Question 41

Q

What is the “Hawthorne effect” as it relates to measurement bias?

Answer

A

participants changing their behavior when aware that they are being observed

can be accounted for by using placebo groups

“HAWTHORNE when they know you SAW THEM”

Question 42

Q

What is procedure bias?

example, how to reduce

Answer

A

subjects in different study groups are treated differently

ex: pt in treatment group spends more time in highly specialized hospital unit, while placebo pt does not

reduce by blinding

Question 43

Q

what is “observer expectancy bias”?

Answer

A

researcher’s belief in efficacy will change outcome (Pygmalion effect)

researcher believing tx will work is more likely to document positive outcomes

Question 44

Q

what is “design bias”?

Answer

A

parts of the study do not fit together to answer the question of interest

ex: a non-comparable control group

Question 45

Q

What is a “convenience sample” or “sample of convenience”?

Answer

A

a study group / population used in the test of a QUALITY IMPROVEMENT INITIATIVE

Question 46

Q

What is CAPITATION in health care financial management?

what kind of provider network is it used in?

Answer

A

an arrangement in which a payor (individual, gov’t or employer) pays a FIXED, PREDETERMINED FEE to cover all medical services required by a patient

used in HMOs

Question 47

Q

what is “discounted fee-for-service” payment?

Answer

A

when insurance companies pre-negotiate a fee amount they will pay for each service provided by a provider

Question 48

Q

what is “global payment” in health care financial management?

most common example?

Answer

A

arrangement in which an insurer pays a provider a SINGLE PAYMENT to cover ALL EXPENSES ASSOCIATED with a SINGLE INCIDENT OF CARE

ex: pay 1 fee for an ELECTIVE SURGERY and all pre/post-op care

Question 49

Q

what is a “patient-centered medical home”?

Answer

A

model of primary care in which pt has a PERSONAL PHYSICIAN who coordinates + sees pt thru all aspects of care (incl. PREVENTIVE and ACUTE/CHRONIC disease mgmt)

fees can be capitated or fee-for-service

Question 50

Q

What is the eqn for ATTRIBUTABLE RISK FRACTION or AR PERCENT in exposed pts?

Answer

A

ARexp = (risk exposed - risk unexposed)/risk exposed

if using relative risk: ARexp = (RR - 1)/RR

if percent ARPexp = 100 x (RR-1)/RR

Question 51

Q

what kind of variables is the TWO SAMPLE T-TEST good for?

Answer

A

independent - qualitative/categorical

dependent - quantitative (continuous numerical)

eg, two groups (independent) treated in different wards for high risk for MI, with measurement of their plasma homocysteine (dependent)

Question 52

Q

what kind of variables should CHI SQUARE test be used for?

example?

Answer

A

when BOTH indep./dep. variables are QUALITATIVE / CATEGORICAL

used to eval whether EXPECTED FREQ of occurrence fits the OBSERVED FREQ (“goodness of fit”)

ex: eval of Mendelian inheritance of red vs. green seed colors uses chi-square to compare observed and expected proportions of each seed type

Question 53

Q

What is “late look bias”?

example + solution?

Answer

A

individuals with SEVERE disease are less likely to be uncovered in a survey because the DIE FIRST

ex: pts in an AIDS survey only report mild sx because those with severe sx die
solution: STRATIFY the survey by disease severity

Question 54

Q

Comparing an old ADHD behavioral mod tx with a new one…

ADHD is more common in boys than girls > girls and boys are randomized SEPARATELY into the two tx groups

WHAT KIND OF TREATMENT ALLOCATION is this an example of?

Answer

A

stratification

Question 55

Q

What is OUTCOME-ADAPTED randomization in randomized controlled trials?

Answer

A

ratio of patients randomly assigned to the experimental treatment arm versus the control treatment arm changes from 1:1 over time to randomly assigning a higher proportion of patients to the arm that is doing better

Question 56

Q

What is Type I Error?

What is the VALUE associated with this?

Answer

A

FINDING a significant effect or difference WHEN THERE IS NOT ONE (aka false positive error)

α (alpha) is the probability of making a type I error

(Alpha = “Accused an innocent man”)

(“p value” is judged against a preset value, usually 0.05, such that if p < 0.05 there is less than 5% chance of Type I Error)

Question 57

Q

How does VARIANCE relate to standard deviation

Answer

A

variance = (SD)^2

Question 58

Q

What other symbol can be used to represent standard deviation?

Answer

A

σ (sigma)

Question 59

Q

Hook for remembering what Chi square test is for?

Answer

A

Chi is for categorical variables (both dep + indep)

“CHI-tegorical”

Question 60

Q

In CORRELATION ANALYSIS…

what is the “coefficient of determination”?

Answer

A

COD = r^2 (correlation coefficient squared)

(quantifies the AMOUNT OF VARIANCE in one variable THAT CAN BE EXPLAINED BY VARIANCE of other variable)

(ie, if correlation coefficient is r = 1 then ALL variance in one variable can be explained by the variance in the other because 1^2 = 1 …. but the further from 1/-1 that r is the less the variance is actually determined by the variance in the other variable)

Question 61

Q

A question says:

“we chose the sample size to have an 80% power of detecting a significant difference…”

What type of error does this tell us about and what is the value of that error type?

Answer

A

Type II error (β)

since it has a STATISTICAL POWER (ability to detect a difference when there is one) of 80%, then it has a 20% chance of NOT DETECTING when there IS A DIFFERENCE

so β = 20%

Question 62

Q

What is meant by the “SIGNIFICANCE LEVEL” of a statistical study?

Answer

A

significance level is EQUAL TO α (alpha)

ie, the probability of making a TYPE I ERROR - finding a significant difference when there is none

(ie, Accusing an innocent man … remember α goes with Accusing)