Causality, Bias & Confounding Flashcards
What questions are asked in description?
- what happened?
- who was affected?
- people with X had Y
What questions are asked in prediction?
- what will happen?
- who will be affected?
- people with X are more likely to have Y?
What questions are asked in causal inference?
- what will happen if…?
- why were they affected?
- if we changed X, how would it change Y?
What questions would be asked if it was qualitative?
- what matters?
- why does it matter?
- how can we effectively change x?
- should we change x?
The headline “organic food lowers blood and breast cancer risk” is an example of what?
causal nonsense
it is implying that if you eat organic food, you will have a lower risk of contracting cancer
What type of approach to causal inference is shown here?

‘causation’ of infectious disease is fairly simple
this is deterministic
What is meant by a deterministic relationship?
a deterministic relationship involves an exact relationship between two variables
the deterministic model gives the same exact results for a particular set of inputs, no matter how many times you re-calculate

What is an example of a deterministic relationship on a molecular level?
molecular and cellular processes (e.g. laboratory studies) show a deterministic relationship
relaxed myometrial cell + prostaglandin E2 = contracted myometrial cell

Why can a deterministic model not be used for the vast majority of health outcomes?
for the vast majority of health outcomes, there are multiple causes

What is the difference between a deterministic model and a probabilistic model?
probabilistic models incorporate random variables and probability distributions into the model of an event
a deterministic model gives a single possible outcome for an event
a probabilistic model gives a probability distributon as a solution
What type of relationship is shown here?
What problem does this raise with causal inference?

this relationship is NOT probabilistic
how do we identify causes and what works “best” when one thing doesn’t necessarily lead to another?
What is the fundamental problem of causal inference?
you can never know what would have happened if you had done things differently
i.e. we cannot observe the counterfactual
What do we need to do in order to study how most things work?

to study how most things work, we have to come up with an “estimate” of the counterfactual
i.e. a control

What is the problem with estimating the counterfactual?

individual people are very different and have lived very different lives

What is meant by exchangability?
Why is it important?
because everyone is different we have to work with groups of people and find ways to ensure our groups are - on average - comparable

What is the best way to achieve exchangability?
the easiest way to do this is through randomisation
this produces both the intervention group and the comparison

Is randomisation always going to produce exchangability?

NO because randomisation is a blunt tool
the sample needs to be large enough to account for differences

What is meant by random sampling error?
the random error in our population estimate (s) that results from chance fluctuations in the profile of our sample
e.g. want 50% blue and 50% green
the sample contains 83% blue and 17% green

Without randomisation, does a bigger sample size help to acheive exchangability?
without randomisation, the exposure is assigned by the underlying bio-psycho-social determinants
a bigger sample won’t help to achieve exchangability

What is meant by confounding bias?

distortion of the causal association between two variables, due to a common shared cause (a confounder)
confounding does not just generate spurious associations, it can also exaggerate, suppress and entirely mask associations
confounding can result in a distortion in the measure of an association between an exposure and a health outcome

How do we reduce confounding?
we reduce confounding by examining like-for-like participants
this is known as conditioning

What is shown in this example?

the causal effect of obesity on cancer is confounded by exercise
To estimate the unconfounded effect of obesity on cancer, what needs to be done?
By which 3 methods can this be achieved?

we would need to condition on exercise levels
restriction:
- restrict the sample to a single value of the confounder
- e.g. look at the association in people who do zero exercise
stratification:
- calculate category-specific effects for different levels of the confounder
- i.e. stratify across exercise levels
covariate adjustment:
- adjust for exercise as covariate in a regression of obesity on cancer
Why can conditioning not completely remove confounding?
unobserved confounding:
- because of other confounding variables that we did not measure
residual confounding:
- error in our measure of exercise - imperfect conditioning

How does sample size affect random error and bias?
a larger sample size REDUCES random error (or ‘error’)
but
it has NO EFFECT on systematic error (or ‘bias’)

How does sample size and quality affect precision and accuracy?
*

What is the difference between measurement error and measurement bias?
measurement error:
- error in your measurement due to random factors
- e.g. weighting scales vary according to climate (temperature, humidity, etc.)
measurement bias:
- error in your measurement due to non-random factors
- weighing scales are broken & under-report by 10%
When does misclassification error / misclassification bias occur?
when measurement error and measurement bias result in misclassification

Is this an example of error or bias?

BIAS
- the GP could see that John was anxious
- she knows that patients have a higher BP when measured
- this is a form of response bias known as white coat hypertension
- she was therefore less concerned about the measurement error and knew the result was likely higher than typical
What are the 6 categories of bias?
- confounding bias - in reading - up on the field
- selection bias - in specifying and selecting the study sample
- information bias - in executing the experimental manoeuvre (or exposure)
- experimenter bias - in measuring exposures and outcomes
- analytic bias - in analysing the data
- inferential bias - in interpreting the analysis
What is shown by this diagram?

different types of bias are not mutually exclusive
a study can be biased in many different ways
Why does selection bias occur?
it occurs due to a systematic difference between those selected into a study (or analysis) sample and those not selected
What are the 3 types of selection bias and why do they occur?
sampling bias:
- broadly due to faulty sampling by the investigator
participation bias:
- broadly due to behaviour of (potential) participants
attrition bias:
- due to loss of participants from the study
Why does sampling bias occur?
a failure to sample evenly across the population resulting in an unrepresentative sample

How is this an example of diagnostic bias?
“Oral contraceptive use is associated with endometrial cancer”
oral contraceptive causes “breakthrough bleeding”
this is also a symptom of endometrial cancer
there is increased clinical suspicion, leading to referral and diagnosis
What is meant by survivorship bias?
What type of bias can exacerbate this?
- successful people often attribute their success to their actions and behaviours
- “take risks!” “be rebellious!”
- we do not know about occurrence of these actions and behaviours in non-survivors
- this is exacerbated by attribution bias
- people attributing their success to their actions and behaviours, not their good luck
What is meant by participation bias?
bias resulting from people having differential preferences (or opportunities) to participate in research
willingness and ability to participate in research varies with almost all possible bio-psychosocial factors
- health (physical or psychological)
- education (interest, curosity, etc.)
- beliefs (religious, spiritual, political, etc.)
- psychology & personality (self-efficacy, openness, scepticism, etc.)
- economics (time and cost, although cost is usually reimbursed)
How is this an example of participation bias?
“Increasing levels of education are strongly protective of stillbirth”
- most cases consent to participate, but consent in controls much lower in women with less education
- control group is disproportionately educated
- education appears protective
Why does information bias occur?
due to systematic error in reporting, measurement or recording of information
What are the 3 different types of information bias?
response bias:
- people responding in inaccurate or untruthful ways
recall bias:
- people having different abilities to remember past information
observer effect:
- people behaving and responding differently when they know they are being observed
What is meant by acquiescence bias?
people prefer yes-, true- or agree-type responses
What is meant by social desirability bias?
people downplay undesirable traits and exaggerate desirable ones
e.g. people often underestimate how much alcohol they drink
How is this an example of the observer effect?
- “Physical activity in pregnant women with obesity measured by accelerometer*
- Almost all participants were doing >/= 30 mins moderate / vigorous activity per day”*
women increased their physical activity when observed
some also juggled the accelerometer / attached it to their dog
What is meant by experimenter bias?
bias due to the behaviours and actions of the experimenter, whether conscious or unconscious
beware of conflicts of interest - always check who funded a study!
What are the 2 different types of experimenter bias?
confirmation bias:
- more likely to accept findings that we expect, and refute findings that we don’t
systemic bias:
- more likely to chase positive associations (p-hacking), seek novel results, or otherwise “find” or publish results that support our career progression and security
What are the two main types of error in population-level research?
- random error (‘error’)
- systematic error (‘bias’)
What is the difference between precision and accuracy?
precision:
- precision increases as random error decreases
- this can be achieved by increasing sample size
accuracy:
- accuracy increases as systematic error decreases
- the sample size has no effect on the degree of systematic error
What does causal inference in observational data require?
causal inference in observational data requires causal inference methods
- probability theory
- counterfactual reasoning
- graphical model theory
don’t infer causality without directed acyclic graphs
