Epi Flashcards
Selection Bias
Error due to systematic differences in characteristics of those who do or do not participate in the study. This occurs bc of the action of researchers and/or participants.
- -Study population is not representative of source population
- -“Systematic” : associated with both exposure and outcome status.
- -“Participate”: enrollment or retention
Can lead to incorrect measure of association
Evaluating The effects of Bias on the validity of effect estimate (OR RR)
Potential alternative explanation for observed effect (need to consider the direction and magnitude)
Effects of bias are difficult to quantify and also hard to remove once introduced. This is why we try to prevent it from the beginning and consider it as a possible variable when analyzing data.
Five Sources of Selection Bias
In case control studies
- Selection of control group
- Self selection
- Differential surveillance/diagnosis/referral patterns
In cohort studies
- selection of external comparison group
- losses to follow up
Bias in Selection of Control Group
Bias can occur is control group is not representative of the underlying source population with respect to exposure.
This can happen when different criteria are used to select cases and controls, and those criteria are related to exposure i.e., difference between cases and controls that is related to exposure. Results in invalid OR
How can selection Bias be minimized
Ensure that control condition does not share exposure as risk factor
–Based on external knowledge e.g., knowing about ulcers and Coffee
Use same selection criterion for cases and controls
–Do not impose explicit or implicit restrictions that you do not on the other
Use controls from same catchment area
–E.g., Restrict cases to diabetic patients who reside in general catchment area for hospital where injury controls and enrollment
Self Selection Bias
AKA: non-response, non-participation, or refusal bias
Occurs when there is a systematic difference in who participates (volunteers) and who does not participate
Can result in invalid OR
How to minimize self selection bias
Ensure high participation rates in all groups
- -Motivation
- -Ease/feasibility
- -Incentives/rewarding
- -Study staff
- -Etc.
Differential diagnosis and/or referral (case control)
Participants (cases/controls) are made known to investigators and thus enrolled in a way that is differential related to exposure.
- -Can be in how cases are detected
- -Can be in how controls are IDed
- -Also called detection bias
Minimizing Differential detection bias
Use multiple control groups (i.e., population based and hospital based)
Selection of control condition with similar detection
Use a case definition that addresses detection
Selection of Comparison Group (cohort study)
May occur if external comparison group is used
This is a type of selection bias
Loss to follow up bias (cohort studies)
People who are lost to follow up may differ from those who remain in the study
A major threat to the internal validity in cohort and intervention studies
Losses to follow up may or may not result in biased estimates of relative risk depending on whether or not the difference is systematic.
Differential and Non-Differential
Differential (systematic) : Related to both exposure and disease resulting in biased estimates of relative risk (RR), either toward or away from the null.
Non-Differential: Related only to exposure or disease but not both. Generally does not create problems with RR
Sources of Information Bias
Non- Differential Misclassification
Differential Misclassification
- -Recall bias
- -Interviewer bias
Directionality of Bias
When working with relative risk that are <1.0 we can still think about the directionality of bias similarly
Is the biased/observed measure of association closer or further from the null (RR = 1.0?)
Information Bias
Error in classification of exposure and/or outcome
Non-Differential Misclassification
- -Errors associated with exposure OR disease, but not both
- -For ex: error in exposure is not related to disease
- -Effect on OR/RR is biased towards the null (usually)
Differential (systematic) Misclassification
- -Errors related to exposure and disease
- -For ex: error in exposure is related to disease
- -Effect on OR/RR is biased away form or toward the null (this is unpredictable)
Case-Control Studies + Misclassification
- -Generally observe the same pattern
- -Case control studies
Confounding Variables
An unmeasured third variable that influences both the supposed cause and the supposed effect.
The observed measure of association is distorted
because the effect of the extraneous factor is mixed with
the effect of the exposure on the outcome
– Get an incorrect effect estimate
– Either overestimation (positive confounding) or underestimation
(negative confounding)
Association
Smoke asbestos/radon exposure
Predictors (risk factors)
Family history alcohol use
Associations
Carrying Matches
Like Bias
- -Alternative explanation for observed effect
- -Distorts the true measure of association
- -Can overestimate, underestimate, or reverse direction of effect
Unlike Bias
- -Not systematic error committed by investigator or participants
- -Reflects the nature of what we are studying
- -Complex relationships between many variables
- -Can be controlled in design and analysis
Assessing Confounding (1): Exposure-Confounder Relationship
- -These two variables (exposure + Confounder) need to be associated in your data
- -Not necessarily a risk factor or causally related
- -Uneven distribution of confounder in exposed and unexposed groups for any reason (associated in your data or a true risk factor)
- -So what is one good method to control confounding
(randomization helps control this)
Assessing confounding (2): Confounder-Disease Relationship
- -Is the confounder a risk factor for the disease/outcome
- -Association between confounder and disease exists independant of exposure
- —Association is present among both the exposed and the unexposed
- —-Can check this in your data
- -This is a more stringent measure
Assessing confounding (3)
A confounder cannot be an intermediate step in the causal pathway between exposure and disease
Randomization
- -Random assignment to treatment condition in an intervention study
- -If done properly with large enough sample size this has a high likelihood of producing an even distribution of all other variables between study groups.
- -Comparison groups will have a similar distribution of all potential confounders (known and unknown)
- -> Exposure is not associated with potential confounders
Problems with Randomization
Not always feasible
Sometimes it does not work (can be checked in the analysis)
Controlling Confounders
To remove distortion in observed measure of association
between exposure and outcome
– For valid interpretation effect estimates
• Design
– Randomization
– Restriction
– Matching
• Analysis
– Standardization
– Stratified analysis including adjustment
– (Matched analysis in case-control studies)
– (Multivariate regression)
If Randomization is not possible
–Must ID potentially confounding variables prior to conducting study
–Consider all known or suspected risk factors for disease under study that might also be associated with exposure (often consider age, sex, and race to be potential confounders of many associations)
–Should also be done if randomization does not work
Restriction (Confounders)
Limiting eligibility criteria to certain levels of suspected confounders
Advantages: Fairly simple approach
Disadvantages
- -Generalizability
- -Sample Size
- -Multiple potential confounders
Matching
Selecting study subjects so that potential confounders will be evenly distributed between study groups
–Can do this in case-control studies or cohort studies
Some situations, this is desirable
- -small # of cases (limited statistical power)
- -When confounders are complex to measure (e.g., SES )
Some potential disadvantages
- -Can be logistically difficult (esp. with multiple confounders)
- -Requires the use of different statistical analysis in case-control studies
- -Cannot analyze matching variables
Methods to Control Confounding Analysis
- -Standardization
- -Stratified analysis including adjustment
- -(Matched analysis)
- -(Multivariate regression)
General advantages
- -Can do post-ho (if you’ve collected data)
- -Data analysis is more informative
Stratified Analysis
Conducting analysis within levels (strata) of the potentially confounding variable
–Each stratum is restricted
Objectives
- -Evaluate the presence of confounding
- -Control for confounding
- -Learn more about complex relationships between variables of interest
Steps in Stratified Analysis
- Examine crude measure of association between exposure and outcome
- Stratify data by level of confounder, and examine stratum-specific measures of association between exposure and outcome
- If stratum-specific estimates are similar to each other and different from crude estimate then confounding is present
- Report the stratum-specific estimates, OR
- Calculate a summary measure of association (adjusted estimate) using formulas provided in table
Measurement Error
Error in measurement that are random and occur in an unpredictable way.
Can minimize by taking repeated measurements
Precision
Lack of random error
Random Error
Error that occurs due to chance (bad luck)
Uncontrollable forces that have no explanation
Sampling Variability
A sample may be unrepresentative of the source population due to chance, In this way, it is unpredictable.
Reducing Sampling Error
Increase Sample Size
- -Small samples are more likely to produce erroneous results due to chance alone
- -
Take Repeated samples
Random vs Systematic Error
Statistical Inference
Goal: To lear about the true exposure-outcome association
Reality
- -You do a study, the results of which are dependent on who is in your study sample (sampling variability)
- -You may select an unrepresentative sample that produce
Two Methods to assess random error
Hypothesis testing (p testing)
- -Quantifies the degree to which sampling variability may explain observed association
- -The likelihood (probability) that chance is an explanation
Hypothesis testing: 3 Steps
- Specify null and alternative hypothesis
- H(0) is assessed by a statistical test that gives you a p value after computation using known formulas
- Reject or fail to reject the null hypothesis based on p-value
Statistical significance
By convention: If the probability of observing a results given that the null hypothesis is true is p
P Values
Small P value can occur for small effects in large studies
- -It is possible to have a statistically significant trivial risk increase when the sample size is large
- -If you have a small p value is it bc of a strong association or a large study
Large P values can occur for large effects in small studies
– it is possible when the sample is small that a large risk increase is not statistically significant.
P values are confounded statistics: They mix effect size and
Confidence Intervals
Definition of a 95% CI: You do a study 100 times and got 1– point estimates and 100 CIs in 95 of the 100 results, the true point estimate would lie within the given interval. In 5 sentences, the true point estimate would not lie within the given interval
Interpretation of single 95% CI: Range of plausible values within which the true
Confidence intervals
Width of CI indicates: Amount of sampling variability (random error) in the data and therefore a level of certainty
All else being equal, a large sample size will produce a more narrow CI
Why CI intervals are preferred over p values
–The p value and CI are calculated form the same info
Put unlike P values…
–CIs separate magnitude of the effect from sample size
Hierarchy of Populations
External : Population to whom we hope our findings apply
——-> Generalizability
Target : Population of interest / feasible to study
——–> Selection Bias
Study : Population from whom we sample study participants
———> Sampling Error
Sample : Individuals who are enrolled in our study / provide study data
Generalizability
Considerations for generalizability
–Knowledge of study population; comparison to external
populations
–Understanding of disease process (biology)
–Effect Modification is an important consideration
Largely a qualitative exercise
Note: Internal validity is a prerequ for generalizability
Multiplicative Effect Modification
RR (11) = RR when both factors are present compared to neither
RR (10) = RR when only one factor is present compared to neither
RR(01) = RR when only the other factor is present compared to neither
RR (00) = 1
Assessing Additive Effect Modification
R(11) = R(10) + R(01)- R(00) –> No additive effect
R(11) = Risk present in both factors R(10) = Risk Present in one factor alone R(01) = Risk in the presence of the other factor alone R(00) = Risk in absence of both factors
OR
RR(11) = RR(10) + RR(01) -1 (divide each term by RR(00))
Screening
The presumptive identification of unrecognized disease or condition by application of test, exams, or procedures which can be applied rapidly to sort out apparently well persons who probably have a disease from those who probably do not
A screening test is performed to identify disease or precursors of disease at an earlier more treatable stage in asymptomatic individuals.
Primary Prevention
Activities that prevent the disease process from Starting
Reduce incidence of disease
Secondary Prevention
Early Detection (Screening)
- -Activities that reduce the expression or severity of disease
- -IDs asymptomatic individuals with disease
- -Delay onset of clinical disease: Shorten duration; improve survival
- -May see an increase in incidence
Tertiary: Minimizing impairments and disabilities
- -Activities that slow or stop progression of clinical disease
- -Improve survival/quality of life
When is it appropriate to screen?
- “important” health problem (prevalence, severity, morbidity, mortality)
- Detectable pre-clinical phase that is “long” and “prevalent”
- Appropriate test (simple, rapid, and convenient to conduct, inexpensive, accessible to population and safe, Accurate (valid).)
- “Acceptable” treatment for persons with condition that positively affects the natural history of disease
Goal with Screening
It is to correctly id individuals with and without the disease
Measurements using in screening
Sensitivity: % of persons with the disease who test positive (Probably that a test correctly classifies person with disease)
Specificity: % of persons w/o disease who test negative (probability that the test correctly classifies persons w/o disease.)
Positive predictive value: % of people with positive test who have the disease
Negative predictive value: % of persons with negative test who do not have the disease
Why is PPV important for screening?
- -Yield
- -Detection of large number of people with disease among those who test positive
- -Minimize the number of false positives (“b”)
- —-Yield will be higher is false positives are lower
- —-Costly, and harmful
Determinants of PPV
- -Prevalence of disease: Population characteristic
i. e., Screening a high risk population will provide a higher PPV than screening a low risk population - -Test Characteristics
- -> Specificity (more so) –> determines the # of false positives, which should be relatively low for high PPV
- -> Sensitivity (less so)
Consequences of False Positives (b cell)
- -Monetary costs of morbidity associated with diagnostic test
- -Psychological costs of believing one may have a serious disease; stigma
- -Discomfort, cost to patient
- -Burden on health care system
Consequences of False Negatives (C cell)
- -Disease will be missed at an early, treatable stage
- -Seriousness of disease
Healthy worker effect
type of selection bias that occurs in occupational epidemiology and generally occurs in cohort studies when comparing internal to external
people who are working are generally healthier than those who are not…