Biostats HY Flashcards
Give an intervention to one group and give placebo to other group then compare /record outcomes
Random Controlled Clinical Trial
(RCT)
Compare group of ppl with an uncommon dz (or characteristic) and a group of ppl w/o the dz and look back in time for exposures
Case-Control
(calculate odds Ratio)
Cohort is opposite it looks at exposure first then future dz.
Case control looks dz first then back in time for exposure!
prominent issue with Case Control Studies?
Recall Bias
To do a study on a rare phenomena (or disease)
_____ studies are typically the best option on NBME exams.
case-control study
Which study looks at 2 groups; one with a risk factor/exposure and one w/o risk factor and then follow into future to see if they develop a particular outcome (disease/adverse effect)
Cohort studies
(calculate relative risk)
Lower P value (<0.05) = higher (2)
confidence & power
(that results are not by chance)
which P-value is better?
P<0.05 or <0.01?
P<0.01
means 1% chance that results were due to chance
__% of population with normal distribution should fall within 2 Standard deviations below & above of the mean (average)
95%
Example: SD is 100 and mean is 1000.
2SD below mean = 800
2SD above mean = 1200
95% of population falls within 800-1200
5% must fall outside this range
2.5% less than 800
2.5% higher than 1200
95% of population with normal distribution should fall within __ Standard deviations below & above of the mean (average)
2
Example: SD is 100 and mean is 1000.
2SD below mean = 800
2SD above mean = 1200
95% of population falls within 800-1200
5% must fall outside this range
2.5% less than 800
2.5% higher than 1200
__% of population falls outside 2 Standard deviations below & above of the mean (average)
___% falls above 2 SD of average
___% falls below 2 SD of average
5%
2.5% above 2SD of mean
2,5% below 2SD of mean
Whenever you have 2 confidence intervals overlap in value (or cross each other) that means results are
not significant
(no difference in effectivity between those two things)
In Ratio derived confidence intervals (Relative risk, Odds ratio) if the confidence interval includes (crosses) the number ___ = not significant
1
(can get ONE, if you divide two of the SAME number)
In Difference derived confidence intervals (Average/ percents/proportions, RRR, Attributable Risk, ARR) if the confidence interval includes (crosses) the number ___ = not significant
0
(can get zero, if you subtract two numbers that are the SAME)
3 Rules for figuring out what a value’s Confidence interval is.
Example: Confidence interval of a Relative Risk of 3.5
- Is it a ratio or a difference?
Relative risk is a ratio so CI can’t include #1 (eliminate those ans)
ARR is a difference so CI can’t include #0 - Value cannot start or end the CI
(ex: confidence interval can’t start or end with 3.5) - Value must fall within the CI range of numbers & be nearest the center of the range.
(eliminate all ans that do not include 3.5 within the range)
CI must include the value (ex: 3.5) at the center within the range of numbers, but the value must not start or end the interval and the interval can’t include the number 1 or 0
Calculate Number Needed to Treat & ARR
ARR = (% of pts who died getting DRUG) minus (% of pts who died getting PLACEBO)
──
NNT= 1 ÷ ARR
Calculate Number Needed to Harm
1 ÷ (% of pts harmed by Placebo) minus (% of pts harmed by Drug)
NNH= 1 ÷ AR
Calculate Relative Risk
& Relative Risk Reduction (Decreased Relative Risk)
Relative Risk
(% exposed/intervention + dz) ÷ (% unexposed/control + dz)
(ex: 20% of smokers got Lung cancer/ 10% nonsmoker got lung cancer = 2 → aka smoking increases risk of lung cancer 2-fold)
─
RR = rate of outcome in exposed/ rate of outcome of control
RRR= (1– RR)
What is the Positive & Negative Likelihood ratio formula?
Positive= (Sensitivity/1– Specificity)
Negative= (1– Sensitivity/Specificity)
When to use the positive or negative likelihood ratio on exam to calculate correct answer?
+ve LRs tell you how much more likely a phenomenon is when you have a +ve test result.
-ve LRs tell you how much less likely a phenomenon is when you have a -ve test result.
Quick way to calculate Odds ratio
Odds ratio
(Expected Outcomes )÷ (Odd Outcomes)
─
Expected: (exposed got disease) x (unexposed no disease)
÷
Odd: (exposed no disease) x (unexposed got disease)
How to Calculate Confidence Interval
90% Z-Score = 1.5
95% Z-Score = 2
99% Z-score = 2.5
ROC curves (how well a test can distinguish b/w 2 groups)
The best test (highest sensitivity & specificity) lies at the _____ of the graph.
top left corner
Cohort study, 2 groups of individuals are initially identified as “exposed” or “nonexposed” according to their exposure status to a specific risk factor and then followed into future to assess development of the outcome (incidence of disease).
Case-Control = 1 Uncommon diseases are followed back in time to assess exposure(s)
Cohort = Exposures are followed into future for development of common diseases
68%, 95%, and 99.7% of a normal population lie b/w __, __, & __ SDs of the mean respectively.
1 (68% → 16%)
2 (95% → 2.5%)
3 (99.7%)
Both test require measuring a quantitative (numerical) outcome
Between 2+ qualitative (Intervention/Risk Factor) groups
compares means of 2 groups, ___ test.
compares means of 3+ groups, ___ test.
T test
ANOVA (or F) test
Chi test has qualitative terms for both intervention and outcome
When you incorrectly reject the null
(state there is an effect when there is not an effect)
= a Type __ error.
Type 1 error (alpha error)
(aka false positive error)
When you incorrectly accept the null
(state there is no effect when there is an effect)
= a Type ___ error.
Type 2 error (beta error)
(aka false negative error)
Power = ___
1– beta
Statistical power is the probability of stating that there is an association & it’s actually true.
(aka rejecting a false null hypothesis)
Narrower CIs tell you study is more ___.
precise
However, you should feel a lot less confident in the results of the study bc the CIs are too narrow (less room for error).
Ways to Increase the power of a study (HY!)
Studies with larger sample sizes have greater statistical power, consequently a lower probability of a type II error
- Recruit more people for a study (larger sample size).
- Have a large difference b/w 2 quantities you’re trying to measure (larger effect size).
- Increase measurement precision (how consistent values are)
- lower P values = more power (P<0.01)
- Increase data for a measured qty cluster around 1 value.
FYI
The fact that something is statistically significant does not mean that it is clinically significant
study compares 2+ treatment on one pt and allows them to serve as own controls
Crossover study
This test has
2 groups divided by
≥2 categorical/qualitative factors
(exposure or intervention)
and measure the categorical/qualitative outcome
observed in each group
Chi squared test
Qualitative = characteristic
Quantitative = numerical values (Temp, BGL, Percentages)
Mean is the average.
Median represents:
1. the ___ #
2. the ___#s
Mode represents the # that ____ in the data set
TIP Arrange data in ascending/descending order before making these determinations.
Middle # (odd # data set)
Avg of the 2 middle #s (even # data set)
Mode = # that is repeated most
For a normal distribution, _____
mean = median = mode
HY
bimodal distributions found in (3 illnesses)
Hodgkin’s lymphoma
Suicide
slow/fast acetylators in metabolism
erroneously thinking that survival has been improved when in fact the “apparent survival improval” arose bc you found disease earlier.
Lead time bias
80% sensitivity = 20% ____ test result
False negative test result
(tested negative but have dz)
Screening test: High _____ test if Negative rules out dz
Confirmatory test: High ____ test if Positive rules in dz
High Sensitivity test if Negative rules out dz (SN-N-OUT)
High Specificity test if Positive rules in dz (SP-P-IN)
Of all the population with the disease what % will have a (+) test result = ___
Sensitivity
High seNsitivity = Low False Negative rate
Missed ones are False Negative
(pt has disease, but result was negative)
Of all the population without the disease, what % have (–) test results = ___
Specificity
A highly sPecific test has a low false Positive rate.
90% specificity = 10% ____ test result
False positive test result
(tested + but have no disease)
Which of the following points best represents the spot with the highest positive predictive value (PPV)
(aka % of people with +ve tests who have disease)
C
The highest PPV region on a graph, corresponds to the region with the highest sPecificity (C)
Sensitivity of a test represents the (% of pts with disease) & have _____
PPV of a test represents the (% of pts with +ve test) & have _____
(+ve test result)
→ SN = % of ppl w/dz the test marks positive
(disease)
→ PPV= % of True Positives the test reports
Equation for calculating Sensitivity
Sensitivity = (True Positive Test)/ (TP test + FN test)
aka (# of diseased & [+] test ) over (# of diseased regardless of test result)
Equation for calculating Specificity
Specificity = (True Negative test)/ (TN test + FP test)
aka (# of healthy & Neg test) over (# of healthy regardless of test result)
Equation for calculating PPV%
PPV= (TP test)/ (TP test + FP test)
aka [# of pts w/ (dz & + test)] over [total # of positive test regardless if true or not]
NPV= TN/(TN+FN)
Which of the following points best represents the region of the graph with the highest negative predictive value (NPV)
(aka % of healthy pts w/ NEG test results)
B
The highest NPV region on a graph, corresponds to the region with the highest seNsitivity (B), which corresponds to the region that DOES NOT miss anyone with disease.
Specificity of a test represents the % of people ____
NPV of a test represents the % of people ____
SPECIFICITY: % people (w/o disease) who have (–ve test results)
NPV: % people with (-ve test results) who are (w/o disease).
Lowering cutoff value dose what (6)
Lowering cutoff (B → A)
↑ SN & NPV & FP
↓ SP & PPV & FN
Increasing the cutoff value dose what (6)
Increasing cutoff (B → C)
↑ SP & PPV & FN
↓ SN & NPV & FP
As Prevalence goes up, ____ should increase too
PPV
As Prevalence goes up, ____ should decrease
NPV
(Inversely related PPV/NPV)
Can Prevalance change Sensitivity or Specificity of a test?
No
(but changing cut off values can)
Prevalence vs Incidence
Prevalence counts at all current cases of dz in total population (live longer stay in population longer incr prevalence)
Incidence counts all new cases in the total population
Prevalence decreases with (4)
increased mortality
faster recovery
more vaccine/prevention
Lowering risk factors
incidence decreases with (2)
more vaccine/prevention
lowering risk factors
TIP place Mean, Median, Mode in alphabetical order.
This should help you remember that:
in a Negatively skewed curve (flat portion on the ___): ____
in a Positively skewed curve (flat portion on the ___): _____
Flat left → mean < median < mode.
Flat Right → mean > median > mode.
───
notice how arrow head’s (<) flat part points in direction of the skew’s flat part
Case-control studies can consider only ____ per study but can evaluate exposure to several risk factors.
1 outcome (ie, disease)
3 actions used to control for confounding variables during the design stage of a study.
Randomization
Matching (same # of pts w & wo risk factor)
Restriction (participation criteria)
Nonrandom treatment assignment may lead to ___ bias
selection bias
Stratified analysis of the extraneous variable can help distinguish whether that variable is a confounding bias or an effect modifier.
It is a confounding bias if ______.
Stratified analysis of both groups yields similar RR (relative risk) no significant difference
If RR between 2 groups are significantly different → Effect Modifier
Because cohort studies measure incidence of disease, they provide a measure of
relative risk of disease
A case-control study compares the exposure status of people with & w/o a disease (ie, cases), they provide a measure of
odds ratio
Types of studies:
A) 2 groups: Disease & no disease
B) 2 groups: Exposed & not Exposed
C) 2 groups: All subjects have disease or don’t have dz (risk factor and outcome are measured simultaneously)
A) Case control
B) Cohort
C) Cross Sectional
By raising the cutoff value, it is harder to get a ___ test result and easier to get a ___ test result.
Harder: positive test result ( ↓ False Positives)
Easier: negative test result ( ↑ False Negatives)
Equation for Accuracy
(probability that an individual will be correctly classified by a test)
(True positives + True negatives) / Total number of individuals tested
probability of *having** the disease if pt gets negative test results.
vs not having the disease if pt gets positive test results
100 – NPV
100- PPV
Increasing the CI from 95% to 99% does what to the range of numbers?
Makes the range larger/wider
Mean 7 → 95% CI= 4–10
Mean 7 → 99% CI= 2– 12
P value of <0.05 in words mean
If there is no real difference between 2 groups, there is a 5% chance of finding a difference
The probability/likelihood that a patient with a negative test result truly does not have the disease.
Negative predictive value (NPV)
ANOVA test
If p-value is greater than alpha (α) the results are ____.
Not significant (no difference/are similar)
if p-value less than α → significant (difference exists)
correlation coefficient (r)
r < 0 =
r > 0 =
(NEG correlation #) as one variable increases, the other variable decreases
(POS correlation #) both variables increase (or decrease) together.
Increasing power
(guess before flipping card)
In a NORMAL DISTRIBUTION
Pt’s above/below the mean by
1SD → is within the __% of the distribution.
2 SD → is within the __% of the distribution.
3 SD → is within the __% of the distribution.
68%
95%
99.5%
The narrower the confidence interval range is, the more ____ the test/results are.
precise
(FYI: Increasing the sample size increases the precision of the study, but does not affect accuracy.)
The best diagnostic accuracy represents the
best compromise between highest sensitivity & specificity
(aka top left corner of ROC curve)
Odds Ratio cannot establish
risk (increase nor decrease)
Calculate Absolute risk reduction (ARR) & Attributable Risk (AR)
ARR = (Intervention % outcome) – (Control % outcome)
AR = (Control % outcome) – (Intervention % outcome)
A ___ study design is best for determining the incidence of a disease.
What about prevalance?
cohort
Prevalence = Cross-Sectional
A ___ study is best for determining odds of developing a disease.
Case-control
The typical example of lead-time bias is prolongation of apparent survival in patients to whom a test is applied, without changing the prognosis of the disease.
FYI