Stats + Public Health Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Comparing differences between means of SEVERAL independent groups

best method? null hypothesis?

A

ANOVA - analysis of variance

compares MEANS BTWN GROUPS with the VARIABILITY WITHIN GROUPS (the “F test”)

determines whether any of the means are signif diff

null hypothesis is that all groups are simply random samplings of the same population (ie, their means are the same)

rejected if at least 2 of the groups have signif diff means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Tx vs Control group chosen in present

compare for outcome of interest in FUTURE

A

Clinical trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Risk factor group vs. no risk factor group chosen in present

Compare disease INCIDENCE in FUTURE

A

Prospective cohort

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Review past records to find…

Risk factor-positive vs. risk factor-negative groups in PAST…

… comparing disease INCIDENCE in PAST

A

Retrospective Cohort

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Select diseased and non-diseased people in present…

compare past records for risk factor exposure

A

Case-Control Study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Compare risk factor-positive vs. risk factor-negative grps in PRESENT, looking for disease PREVALENCE

A

Cross-sectional

takes place entirely in present (ex: look at sodium channel mutants vs. non-mutants and measure their BP)

(can be serial BP measurements over a week period… still considered “present”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is standard deviation?

A

a measure of “degree of dispersion” from the mean

SD is a distance from the mean of a data set in which a FIXED PROPORTION of the observed data points lies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does a large vs. small standard deviation mean?

A

large - observations (data points) are spread over a larger range

small - data points are clustered more tightly / vary less

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the rule for what % of observed data points lie within 1, 2 and 3 standard deviations of the mean?

A

68 - 95 - 99.7 rule

(or remember 70-95-100)

68% lie within 1 SD
95% within 2 SD
99.7% within 3 SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cohort study

measurement used?

A

relative risk

risk of outcome in expose / risk of outcome in unexposed

RR = 1.0 (null value)
RR > 1 - exposure related to incr. risk of outcome
RR < 1 - exposure related to decr. risk of outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

2 measures of STATISTICAL SIGNIFICANCE that can strengthen findings of a study using RR (cohort study)

when is the result considered statistically significant by these measures? how are the measures related?

A

95% confidence interval

p-value

when 95% CI does not contain the null value (RR = 1) it is statistically significant

when p-value is <0.05, result is stat signif

when 95% CI does not contain null value, p-value will be <0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the relationship btwn 95% CI and p-value?

99% CI and p-value?

A

95% CI not containing null value = p value <0.05

99% CI not containing null value = p value <0.01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

2 broad classes of variables

A

Qualitative (categorical) - disease status, blood type, etc.

Quantitative - body weight, glucose level, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Test for association btwn TWO CATEGORICAL VARIABLES

both dep./indep. variables are categorical

A

CHI SQUARE TEST

evaluates assoc. (or lack thereof) btwn 2 categorical variables (eg, statin therapy vs. no statin and low vs. high preprocedural fibrin levels in PCI pts)

(logistic regression could also be used if the DEPENDENT VARIABLE IS DICHOTOMOUS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the test for assoc. btwn an INDEPENDENT QUANTitative and DEPENDENT QUALitative variable?

caveat?

A

Logistic regression

dependent variable MUST BE DICHOTOMOUS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the test for assoc. btwn an INDEPENDENT QUALitative and a DEPENDENT QUANTitative variable?

specifically when there are ONLY TWO GROUP MEANS being compared

A

the TWO SAMPLE T-TEST

(remember “Tea is for TWO”)

ex: compare mean BP (quant) btwn men + women (qual)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

test for assoc. btwn INDEPENDENT QUALitative and a DEPENDENT QUANTitative variable?

when there are greater than 2 group means

A

ANOVA

analysis of variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When can LINEAR REGRESSION be used to determine if there is assoc. btwn two variables?

A

when the DEPENDENT VARIABLE IS QUANTITATIVE

whether the independent is quantitative or qualitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is the test for assoc. btwn an INDEPENDENT QUANTitative and a DEPENDENT QUANTitative variable?

what value is given in this test?

A

Correlation analysis

gives a “correlation coefficient” known as “r” which is between -1 and 1 depending on if the variables are directly or inversely correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Linear Regression vs. Correlation Coefficient

what are they + how are they different?

A

LINEAR REGRESSION - models linear relationship (makes a “trend line”) btwn dependent + independent variable (ie, # of cigarettes smoked per day as it relates to # yearly hospitalizations in COPD pts)

CORRELATION COEFFICIENT - a measure of the strength and direction of a linear relationship btwn 2 variables (eg, assoc. btwn estrogen level + breast cancer risk)

CC is reported as a single number describing the strength and direction (negative or positive) of the correlation; LR is a line-of-best-fit made from individual data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the two sample t-test often used for?

what value can be calculated from the two sample t-test?

what data is needed to do the test?

A

to see if the means of 2 populations are equal

gives the P-VALUE … if p < 0.05 null hypothesis rejected and means are statistically different

needs the 2 means, the standard deviations of each mean, and the sample sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Case-Control Study

parameter calculable from the results?

A

Odds Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

1 cause cancer mortality for both sexes

A

lung cancer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

lung cancer mortality trends (1930s onward)

A

smoking peaked in the mid-50s but MORTALITY RATES PEAK 20-50 YEARS AFTER SMOKING ONSET

(chart shows large increase in mortality from 70s onward and then decline from 2000 on due to declining smoking rates)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

2 cause of cancer death in women in US?

remember lung is #1 for both sexes

A

breast cancer

26
Q

breast cancer mortality trends (1930s on)

reason for changes?

A

fairly steady (high) rates from 1930-1990

decline after 1990 due to adjuvant chemo/radio and screening

27
Q

colon cancer mortality trends (1930s on)

reason for changes?

A

peaked in 1950 and declined since then due to…

surgical + chemo advances
screening
ASPIRIN use
menopausal hormone tx in women

28
Q

pancreas cancer mortality trends (1930s on)

what affects it?

A

pancreatic cancer mortality is low simply because INCIDENCE is low (most cases are fatal, but few cases arise)

it is affected primarily by SMOKING and has slowly risen since the 1930s

29
Q

stomach cancer mortality trends (1930s on)

A

dramatically + consistently declining since 1930s

likely due to better refrigeration, less food preservation (less salt), better sanitation + housing (less H pylori)

30
Q

What is “accumulation effect” when studying risk factors and risk reducers?

A

when exposure to a risk factor / reducer requires a SIGNIFICANT DURATION or INTENSITY of exposure, it can affect whether or not exposure has a statistically significant effect on outcomes

ex: people taking antioxidants for <5 years vs. >5 years and their risk of stroke

31
Q

What is “lead time bias”?

solution?

A

when a screening or dx test detects a disease earlier than normal, it may appear that the test results in longer survival after diagnosis

if the test does not actually affect earlier treatment of the disease, however, then the seemingly increased survival time is only due to earlier detection

solution: use life-expectancy to asses benefit

32
Q

What is the “rare disease assumption”?

A

diseases with a low incidence rate (“rare”) will also have a low prevalence

when prevalence is low (eg <10%) then the ODDS RATIO is APPROXIMATELY SAME AS RELATIVE RISK

(Odds ratio = case-control studies; RR = experimental, cross-sectional and cohort studies)

33
Q

PPO (preferred provider organization)

monthly premiums?
copayments + deductibles?
Referral required for specialist?
Network size?
out of network possible?
A
  • HIGH premiums
  • HIGH copays/deductibles
  • NO REFERRALS required
  • LARGE network
  • CAN go outside network
34
Q

HMO (health maintenance organization)

monthly premiums?
copayments + deductibles?
Referral required for specialist?
Network size?
out of network possible?
A
  • LOWEST premiums
  • LOWEST copays/deductibles
  • referral REQUIRED
  • SMALL network
  • CAN’T go outside network

(often uses “capitation”, payment of predetermined amt)

35
Q

POS (point of service)

monthly premiums?
copayments + deductibles?
Referral required for specialist?
Network size?
out of network possible?
A
  • MEDIUM premiums
  • VARIABLE copay/deductible (in vs. out network)
  • referral REQUIRED
  • SMALL network
  • CAN go outside network
36
Q

3 patient populations who can be on MEDICARE

A

> 65

disabled

end-stage kidney disease

37
Q

“z scores” for 95% and 99% of a normal distribution

A
  1. 96 for 95%
  2. 58 for 99%

this correlates to the 68/95/99 rule where approximately 68% of observations lie within 1 SD of mean, 95% within 2 SD and 99.7% within 3 SD

38
Q

What is “standard error” and how is it calculated?

A

SE is the “standard deviation” of MULTIPLE SAMPLE MEANS

it estimates how far the sample mean is likely to be from the actual population mean

calculated as SD/√n

where n is the sample size (the larger the sample size, the smaller the standard error)

39
Q

How can a confidence interval be calculated?

A

CI = sample mean +/- [z-score] x [SE]

remember that 1.96 and 2.58 are the z scores for 95% and 99% confidence levels, and SE = SD/√n

40
Q

What is Berkson bias?

A

a form of selection bias that occurs when a STUDY POPULATION IS SELECTED FROM A HOSPITAL, and is thus less healthy than the general population

41
Q

What is the “Hawthorne effect” as it relates to measurement bias?

A

participants changing their behavior when aware that they are being observed

can be accounted for by using placebo groups

“HAWTHORNE when they know you SAW THEM”

42
Q

What is procedure bias?

example, how to reduce

A

subjects in different study groups are treated differently

ex: pt in treatment group spends more time in highly specialized hospital unit, while placebo pt does not

reduce by blinding

43
Q

what is “observer expectancy bias”?

A

researcher’s belief in efficacy will change outcome (Pygmalion effect)

researcher believing tx will work is more likely to document positive outcomes

44
Q

what is “design bias”?

A

parts of the study do not fit together to answer the question of interest

ex: a non-comparable control group

45
Q

What is a “convenience sample” or “sample of convenience”?

A

a study group / population used in the test of a QUALITY IMPROVEMENT INITIATIVE

46
Q

What is CAPITATION in health care financial management?

what kind of provider network is it used in?

A

an arrangement in which a payor (individual, gov’t or employer) pays a FIXED, PREDETERMINED FEE to cover all medical services required by a patient

used in HMOs

47
Q

what is “discounted fee-for-service” payment?

A

when insurance companies pre-negotiate a fee amount they will pay for each service provided by a provider

48
Q

what is “global payment” in health care financial management?

most common example?

A

arrangement in which an insurer pays a provider a SINGLE PAYMENT to cover ALL EXPENSES ASSOCIATED with a SINGLE INCIDENT OF CARE

ex: pay 1 fee for an ELECTIVE SURGERY and all pre/post-op care

49
Q

what is a “patient-centered medical home”?

A

model of primary care in which pt has a PERSONAL PHYSICIAN who coordinates + sees pt thru all aspects of care (incl. PREVENTIVE and ACUTE/CHRONIC disease mgmt)

fees can be capitated or fee-for-service

50
Q

What is the eqn for ATTRIBUTABLE RISK FRACTION or AR PERCENT in exposed pts?

A

ARexp = (risk exposed - risk unexposed)/risk exposed

if using relative risk: ARexp = (RR - 1)/RR

if percent ARPexp = 100 x (RR-1)/RR

51
Q

what kind of variables is the TWO SAMPLE T-TEST good for?

A

independent - qualitative/categorical

dependent - quantitative (continuous numerical)

eg, two groups (independent) treated in different wards for high risk for MI, with measurement of their plasma homocysteine (dependent)

52
Q

what kind of variables should CHI SQUARE test be used for?

example?

A

when BOTH indep./dep. variables are QUALITATIVE / CATEGORICAL

used to eval whether EXPECTED FREQ of occurrence fits the OBSERVED FREQ (“goodness of fit”)

ex: eval of Mendelian inheritance of red vs. green seed colors uses chi-square to compare observed and expected proportions of each seed type

53
Q

What is “late look bias”?

example + solution?

A

individuals with SEVERE disease are less likely to be uncovered in a survey because the DIE FIRST

ex: pts in an AIDS survey only report mild sx because those with severe sx die
solution: STRATIFY the survey by disease severity

54
Q

Comparing an old ADHD behavioral mod tx with a new one…

ADHD is more common in boys than girls > girls and boys are randomized SEPARATELY into the two tx groups

WHAT KIND OF TREATMENT ALLOCATION is this an example of?

A

stratification

55
Q

What is OUTCOME-ADAPTED randomization in randomized controlled trials?

A

ratio of patients randomly assigned to the experimental treatment arm versus the control treatment arm changes from 1:1 over time to randomly assigning a higher proportion of patients to the arm that is doing better

56
Q

What is Type I Error?

What is the VALUE associated with this?

A

FINDING a significant effect or difference WHEN THERE IS NOT ONE (aka false positive error)

α (alpha) is the probability of making a type I error

(Alpha = “Accused an innocent man”)

(“p value” is judged against a preset value, usually 0.05, such that if p < 0.05 there is less than 5% chance of Type I Error)

57
Q

How does VARIANCE relate to standard deviation

A

variance = (SD)^2

58
Q

What other symbol can be used to represent standard deviation?

A

σ (sigma)

59
Q

Hook for remembering what Chi square test is for?

A

Chi is for categorical variables (both dep + indep)

“CHI-tegorical”

60
Q

In CORRELATION ANALYSIS…

what is the “coefficient of determination”?

A

COD = r^2 (correlation coefficient squared)

(quantifies the AMOUNT OF VARIANCE in one variable THAT CAN BE EXPLAINED BY VARIANCE of other variable)

(ie, if correlation coefficient is r = 1 then ALL variance in one variable can be explained by the variance in the other because 1^2 = 1 …. but the further from 1/-1 that r is the less the variance is actually determined by the variance in the other variable)

61
Q

A question says:

“we chose the sample size to have an 80% power of detecting a significant difference…”

What type of error does this tell us about and what is the value of that error type?

A

Type II error (β)

since it has a STATISTICAL POWER (ability to detect a difference when there is one) of 80%, then it has a 20% chance of NOT DETECTING when there IS A DIFFERENCE

so β = 20%

62
Q

What is meant by the “SIGNIFICANCE LEVEL” of a statistical study?

A

significance level is EQUAL TO α (alpha)

ie, the probability of making a TYPE I ERROR - finding a significant difference when there is none

(ie, Accusing an innocent man … remember α goes with Accusing)