Test 10/24 Flashcards
Dependent variable
what you think the effect of the independent variables will be seen in
Independent variable
YOU vary this in the experiment…. want to see effect on dependent variable
Null hypothesis
states there is NO relationship between the proposed independent and dependent variables
STUDY NEEDS TO PROVE THIS IS NOT TRUE and reject the null hypothesis
Ecologic study
Looks at POPULATIONS only
understand relationship between outcome and exposure at the population level
…. analyses in which the presence of a suspected risk factor is measured in different populations and compared with the frequency of disease onset
Ecologic fallacy
when incorrect conclusions are drawn from ecologic data due to an association at the group level that does NOT persist to the individual level
Association is NOT
causation
Normal distribution– relationship of mean, median, mode
they are all equal
standard deviation
measure of how tightly different data points gather around the mean
The number of standard deviations away from the mean a value lies in a normal distribution tells you…..
how likely that value is to occur
Standard deviation deals with
members of a population
standard error
standard deviation/ square root of the number of all possible samples
expected variability in measurement of a population mean seen in multiple trials
standard error deals with
samples (groups of individuals, aka sample means)
If you have a bigger sample size, the standard error is
LESS and the estimate of the population mean is MORE precise
narrower curve
if you have a smaller sample size, the standard error is
MORE and the estimate of the population mean is LESS precise
wider curve
Prevalance
the number of EXISTING cases of a condition in a population at a MOMENT of time
expressed as a percent
Incidence
the number of NEW cases of a disease that develop in a population over a specified period of time
Incidence requires what 3 things
1) new events
2) population at risk
3) passage of time
What are the two ways to calculate incidence?
Cumulative incidence (risk)
incidence rate
Cumulative incidence (RISK)
= new cases of disease/ total population at risk
Biggest flaw of cumulative incidence
best for fixed populations… does not account for people moving away/ dying etc
Incidence rate
= new cases of disease / total person-time at risk
expressed as a round number…. ie. 1.6 cases per 1000 person-years (usually, multiple number to get in terms of 1000 person-years)
Prevalance of disease (entrance and exits)
incidence ENTERS
cure, death, moving away EXITS
What are three ways to compare the risk in the expose and unexposed groups?
relative risk
absolute risk difference
number needed to treat
Relative risk
risk exposed/ risk unexposed
the probability an event will happen in an exposed group vs. probability an event will happen in a non-exposed group
absolute risk difference (ARD)
risk exposed - risk unexposed
Represents the chance in the risk of an outcome, given a particular exposed
Means “there is a __ % increase in frequency of (outcome) with (intervention)”
number needed to treat (NNT)
1/ absolute risk difference
estimates the # of patients who are exposed to something who will need to receive a certain treatment in order to prevent ONE unfavorable outcome
Can you calculate NNT if you only have RR?
NO, need the ARD
Accuracy
correct diagnoses/ total # of diagnoses
Prevalance Equation
diseased / total population at a specific POINT in time
False positive (FP)
positive test result when a patient does NOT have a disease
True positive (TP)
a positive test result when a patient does have the disease
False Negative (FN)
a negative test result when the patient has the disease
True Negative (TN)
a negative test result when a patient does NOT have a disease
Sensitivity definition (acronym too)
proportion of individuals with the disease that are TRUE POSITIVES
if a patient DOES have a disease, what are the chances they will have a positive result
SnOUT…. Sensitive test that is negative rules OUT a disease (good for screening)
Sensitivity equation / location on chart
= TP / (TP + FN )
Specificity (definition + acronym)
SpIN… specific test that when positive rules IN disease
aka if a patient does NOT have a disease, what are the chances they will have a negative test result
Chart for specificity, sensitivity, PPV, NPV… draw in head
Specificity equation
= TN/ (FP + TN)
Positive predictive Value (PPV) definition + equation
if a test is positive, the probability that a patient actually has the disease
i. Proportion of positive tests that are true positives
=TP / (TP + FP)
Negative predictive value (NPV) definition + equation
Proportion of negative tests that are true negatives
TN / (FN + TN)
Gold standard
the benchmark test that is considered the best available
ROC curve
plot of sensitivity (true positive rate) vs. 1- sensitivity (false positive rate) across a range of values to determine the cutoff
Goal: choose cutoff with high true positive and low false positives …. so that is where the ideal spot is under the curve (so farther left and up is GOOD)
T/ F: Specificity and Sensitivity are affected by prevalance
NO… they are characteristics of the tests themselves
What is affected by prevalance?
PPV and NPV
As prevalance increases
PPV increases
NPV decreases
As prevalance decreases
PPV decreases
NPV increases
When do you use a sensitive test?
first stage
SnOUT…. because a negative rules out a disease (b/c LOW false negative rate)
If the test is negative, then we are confident the patient does NOT have the disease
at first stage you want to be confident in who you are excluding
When do you use a specific test?
second stage
SpIN… a specific test is a positive test that rules in a disease because it has a low false positive rate
if the test is positive, we are confident the patient has the disease
In serious conditions (ie. patient could have a serious condition like a heart attack), then do you prefer a sensitive or specific test?
Sensitive… SnOUT… you want to rule out very serious disease
Pretest probability
patient’s likelihood of having illness BEFORE diagnostic testing is performed
Posttest probability
patient’s likelihood of having a disease AFTER those test results are considered
Likelihood ratios (LR) (definition + equation)
magnitude by which + or - test results alters post test probability
do NOT change disease prevalance
LR + = sensitivity / 1 - specificity
LR - = 1- sensitivity/ specificity
What is the relationship between relative risk and absolute risk difference in a
study in which the null hypothesis is demonstrated to be true?
RR> ARD
because if null hypothesis = true, then risk 1 = risk 2…. so RR = 1
and ARD = risk 1- risk 2 = 0
1> 0
COhOrt Study
stratify based on expOsure first, then track to see if each group developed disease OR not
can be prospective or retrospective
Strengths of cohort studies
efficient for rare exposures
little exposure misclassification
calculate incidence rates and risk
can assess multiple outcomes
Weaknesses of cohort studies + what is the biggest one?
loss to follow up –> BIGGEST ONE
inefficient for rare outcomes or long latencies
expensive
How to recruit groups for a cohort study
1) recruit everyone from ONE pool and divide into groups after based on what you evaluate about them
2) recruit TWO separate groups using subjects with and without a risk factor (can be helpful for rare disease)(need exposed vs. non exposed group to be very similar in other ways ie. live in same area, same diet, SES)
Narrow inclusion criteria
you can be more certain the results work in the group you studied, but may not be generalizable to the other groups you didn’t study
Broad inclusion criteria
you can be certain the result works in the overall population BUT you may not be able to detect if the association is different to a specific subgroup
Reasons you may exclude people from study
already have the condition
inclusion may bias results
hard to follow over time
they may not complete parts of the study
Misclassification bias
bias about what you are classified as in the study;
systematic or random differences in the way data
are obtained on exposure or outcome → distortion in estimation of effect
can under/over estimate the effect you are looking for
can be easier to misclassify in long trials, because harder to keep up w ppl
Counterfactual
as you recruit cohorts, it is necessary for the comparison to be JUST as similar with respects to all factors except the exposure
ie. think about if there are other players that might explain the relationship between exposure and outcome
Loss to follow up
if you cannot establish contact with a participant during a study
if people are lost to follow up, try to make sure the # people lost in both groups = same–> non differential loss to follow up
Methods to minimize loss
collect info at intake to track (address, email, phone #, relative’s contact info)
use subjects more likely to follow up
regular contact
multiple requests if they do not respond
contact info for friends/ families
In cohort studies you can use what measurement…..
relative risk (risk group exposed/ risk group unexposed)
In case control studies you can use what measurement…..
odds ratio
because you are starting with a sample of controls = don’t have data on entire population
Case control study
start with DISEASE (case vs. control), then stratify based on exposure
ALL of the disease cases and a FRACTION of the controls… so you start with cases, then select controls which may be matched to cases
Weaknesses of case control study
not good for RARE EXPOSURES
relative risk cannot be measured –> use odds ration instead
validity can be affected by SELECTION bias and RECALL bias
How to get an appropriate source of controls in a case control study? Sources?
select from individuals who would have been the case group if they had developed the disease
sources: population controls ie. driver’s license, random numbers, voter registration (hard)
hospital/ clinic
family controls
odds ratio (definition + equation)
measure of association between exposure and an outcome (how much higher case patients are affected by exposure than controls)
= ad/ bc
If your odds ratio >1 vs. <1
> 1: positive association… ie. Cases are 1.2 times more likely to be affected by exposure x than the controls
<1: negative association… exposure is protective
Does odds ratio change with sample size?
No
Recall bias
participants may not remember if they were exposed or not
the information is collected AFTER disease status is known, which may affect recall differently
Non differential recall bias
Non-differential: proportions misclassified are about equal among study groups → things are generally messed up, no particular direction, not
always easy to detect that it is happening or detect association between exposure and diseases– blurs difference between groups. May occur due to
“unacceptability bias.” Results in bias towards null
if the errors are the same in both groups…
biases you TOWARDS the null hypothesis = towards odds ratio or RR of 1
ie. recording/ coding errors in databases
defective measurement devices
non specific or broad definition of exposure or outcome
Differential recall bias
Differential: proportions misclassified differ between study groups– misclassification based on exposure or disease; primarily one direction. Info on exposure is dependent on disease status or vice versa. Can result in bias in EITHER DIRECTION from the null
if information is better in one group over other–> association is over/ under estimated
may move RR or OR either towards or away from the null value
Between non-differential and differential bias, what would be prefer?
non-differential… because at least you know what direction it is going to pull you
Selection bias and what it mainly limits
error in choosing the individuals to take part in a study such that the sample obtained is NOT representative of the population you want to study (if too strict definition of cases, may miss mild/ non-classic cases)
LIMITS generalizability
Non-response bias
type of selection bias
think of the people who don’t pick up the phone if you randomly dial numbers for recruitment
also think about how controls do not get as much of a benefit as cases in terms of responding
the people who choose to respond are SPECIFIC types of people and may NOT be representative of the whole population
Random error
error inherent to a study that cannot be avoided
leads to imprecise results
can be QUANTIFIED not prevented
Bias
systematic error caused by investigator or subjects that causes incorrect estimate of association
systematic error can be prevented but HARD to quantify
Confounding
distortion of the true relationship between an exposure and outcome due to the design and analysis that fail to properly account for additional variables (confounders) that is associated with both exposure and outcome
ELIMINATE from the study
A confounder must be
unbalance between the exposure groups (more or less common in exposure groups)
unbalanced between the outcome groups (more or less common in outcome groups)
Interaction
when the magnitude of a measure of an association between exposure and disease meaningfully differs according to the value of some 3rd variable….
more detailed description of the true relationship between the exposure and the disease
REPORT in a study because it matters
shows what may be affecting a relationship
Examples of selection bias
Non-response
survivorship
volunteer
healthy worker effect
Volunteer bias
those who join studies may be different from the non-participants from the get go
Healthy worker effect
happens in cohort studies
those who are employed are more likely to be healthy than the general population since general population has healthy and sick people
What in an epidemiological study can cause bias?
1) differing memory of subjects (recall bias)
2) choosing from a population (selection bias)
3) other biases… on other cards
3 ways to improve studies to minimize bias and confounding
1) restriction
2) matching
3) randomization
Restriction
limiting study enrollment to people who fall within a specific category of the confounder (ie. age, sex)
need to know confounders in advance
Matching
for every person deployed to one group, person in other group chosen who matches them on specific factors (ie. matched for everything except disease)
Randomization is the only way to….
deal with unidentified confounders
Finding confounding vs. interaction
In a clinical trial, what is made by the investigator?
the exposure of interest!!
(in cohort study, the exposure is determined by the subject)
What three factors are SO important to a clinical trial?
Randomization
double blinding
placebo control
Randomization definition
way of assigning each participant to treatment so that each participant has same chance of receiving any of the options
Benefits of randomization
minimizes bias
ONLY WAY TO CONTROL FOR UNKNOWN CONFOUNDERS
Blinding
study is blinded if subjects do not know what therapy they are receiving (prevents subjects from having expectations)
Double bind
NEITHER the subjects or investigators know what therapy the subjects are receiving
Placebo Use
can cause side effects
critical that a patient does NOT know what intervention they are receiving
if you give on group a pill and other nothing, even if the pill does not work, it will likely show an affect due to the placebo effect…. thats why control group gets a placebo (still taking a pill)
Intention to treat analysis
GOLD STANDARD
standard practice to analyze subjects in the group to which they were randomized …. even if they violate protocol, or don’t thake their meds or drop out
Once randomized, subjects are always….
analyzed
benefits of ITT
reflects real world (includes non compliance and protocol deviation)
preserves sample size
maintains study power
cons of ITT
conservative estimate of treatment effect (dilution d/t noncompliance and false negatives)
does not assess efficacy accurately unless violations are negligible
Study power
likelihood of seeing a difference between two groups, assuming there is a difference to be seen (probability of detecting a REAL effect
think telescope analogy — if I point a powerful telescope up into the sky, no
matter how powerful it is I will not see anything if it is cloudy
Power = 1 - beta (type II)
What is the affect of a large sample size on power?
more power! which means you are likely to see a difference between two arms, if there is a difference to be seen
Matching only works if you know what?
what the confounding variables are!!!
matching fails if you find out confounders later!
Cross over clinical trials
each study participant serves as OWN counterfactual, getting ALL possible study interventions in a random order
switch study arms
Confidence intervals
estimate the effect of size and precision (variability of the effect)
In 95% CI… if we were to conduct this experiment 100 times, 95 out of those 100 times, we would see this range of results
does NOT include bias
If our 95% CI does not include null hypothesis, we can reject it
statistical significance (p-value)
if p <0.05, then you reject the null hypothesis and there is statistical significance at the 5% level
does statistical significance = clinical significance?
NO
Type I error = alpha error
false positive detection of an association
you see something that is NOT real
if the null hypothesis is TRUE but you mistakenly reject it
Type II error = beta error
False negative
When there is something to see, but you do NOT see it
mistakenly think two things are the same
P-value
probability that a result as strong or stronger might
be observed if the null is true
-Reject null when p < α→ statistically significant (results not
due to chance)
AKA assuming H0 is true, what is the probability of getting a result that is at least as extreme as the one we got?
What do we evaluate P-value at? What does above and below that value mean?
set alpha at 0.05
P<0.05… reject the null
P>0.05 then cannot reject the null
Define P-value
if the null hypothesis is true, then how likely is the result I got
Effect of variability (noise) on power
decreased noise = increased power
increased noise= decreased power
(Waldo example: the other people in the Waldo picture add noise, make it harder to find him)
Effect size (signal/ outcome trying to observe) impact on power
smaller effect size –> less power
(Waldo example: Waldo is the effect (signal))
Effect of alpha on power
decrease alpha = decrease power
Effect of sample size on Type I and Type II error
increase sample size, type I and type II errors go down
increasing sample size makes it EASIER not to miss the difference between the two groups
Clinical research 2 x 2 table
Example relative risk calculation from LO doc
If you lower the cutoff for a test it….
increases false positives, lower false negatives
= increased sens, increased NPV, decreased spec, decreased PPV
If you raise the cutoff for a test it…
decreases false positives, increases false negatives
= decreased sens, decreased NPV, increased spec, increased PPV