Stats - Association, Causation, Confounding and Bias Flashcards by Roisin McAllister

Two variables are said to be associated when one is found more commonly in the presence of the other.
What are the three types of association?

Spurious
Indirect
Direct

How well did you know this?

Not at all

Perfectly

What is a spurious association between variables?

Spurious – A spurious association occurs when the relationship between two variables appears to exist, but it is false or misleading due to the influence of a confounding variable or random chance. In this case, there is no true relationship between the two variables. The confounding factor creates an illusion of a link, but the association is not real. For example, the apparent relationship between ice cream sales and drowning deaths is spurious, as the real factor influencing both is hot weather, not a direct or indirect link between the two variables.

How well did you know this?

Not at all

Perfectly

What is an indirect association between variables?

Indirect – An indirect association occurs when two variables are genuinely related, but the relationship is mediated or explained by a third variable (a confounder). In this case, the association is real, but the relationship is not direct. For example, socioeconomic status might be linked to health outcomes, but the true mechanism might be mediated through access to healthcare or lifestyle factors, making the relationship indirect. Unlike spurious associations, an indirect association reflects a real connection between the variables, just mediated by the confounder

How well did you know this?

Not at all

Perfectly

What is a direct association between variables?

Direct – A direct association exists when one variable directly affects another without the involvement of any confounding variable. This means that the relationship is causal in nature, where changes in one variable directly bring about changes in the other. For example, smoking directly causes lung cancer without the need for any intermediary factor.

How well did you know this?

Not at all

Perfectly

Once the association has been established, the next question is whether the association is causal. Not all associations imply causality, so additional evidence is needed to confirm a cause-and-effect relationship.

What 2 criteria are used to assess causality?

To assess causality, the Bradford Hill Causal Criteria are commonly used

The Susser’s criteria ensure that a basic causal framework exists before moving to more in-depth evaluation.

How well did you know this?

Not at all

Perfectly

What are the 5 parts of the Bradford Hill Causal Criteria?

Strength – A stronger association (e.g., higher relative risk) increases the likelihood that the relationship is causal rather than due to chance or bias. For example, the strong link between smoking and lung cancer supports a causal relationship.

Temporality – The cause must precede the effect. This is a fundamental criterion; if the outcome occurs before the exposure, then it cannot be causal. (Cohort studies, not retrospective case-control)

Specificity – If a cause leads to a specific effect or disease, the association is more likely to be causal. This criterion has limitations, as many causes can lead to multiple effects (e.g., smoking causes several diseases).

Coherence – The association should align with existing biological and epidemiological knowledge. For example, if an observed association fits with known biological mechanisms, this strengthens the argument for causality.

Consistency – If the same association is observed across different studies, populations, and settings, it strengthens the case for causality. For example, the consistent association between asbestos exposure and mesothelioma across various studies is a strong indicator of a causal relationship.

Note: strength and coherence = dose-response relationship i.e biological gradient

How well did you know this?

Not at all

Perfectly

What are the 3 foundational principles of the Susser’s criteria?

(note: ONLY used after association has been established)

Susser’s Criteria: Time Order, Association, and Direction (TAD)

Susser’s criteria ensure that a basic causal framework exists before moving to more in-depth evaluation. These foundational principles include:

Time Order: The cause must precede the effect in time. Without this temporal relationship, causality cannot be established. For example, smoking must occur before lung cancer develops.

Association: The two variables must be found together more often than would be expected by chance. For example, individuals who smoke are more likely to develop lung cancer compared to non-smokers.

Direction: The relationship between the variables must follow a logical or plausible pathway. For instance, it is biologically plausible that smoking introduces carcinogens to the lungs, which can directly lead to cancer, establishing a one-way causal relationship.

How well did you know this?

Not at all

Perfectly

How do Susser’s and Bradford Hill’s Criteria Work Together? (3 ways)

Time Order and Temporality:

Susser’s Time Order and Bradford Hill’s Temporality both emphasise that causality requires the cause to precede the effect. This is a non-negotiable aspect of causation.

Association and Strength:

Susser’s Association establishes the presence of a link, while Bradford Hill’s Strength evaluates the magnitude of this link (e.g., higher relative risks make causation more likely).

Direction and Coherence:

Susser’s Direction ensures the association follows a logical pathway. Bradford Hill’s Coherence builds on this by confirming that the relationship aligns with established scientific knowledge.

How well did you know this?

Not at all

Perfectly

What is the name given to the situation in a trial where one outcome is systematically favoured?

Bias

How well did you know this?

Not at all

Perfectly

Give 6 examples of selection bias:

1) sampling bias where the subjects are not representative of the population. This may be due to volunteer bias.
2) non-responder bias e.g If a survey on dietary habits was sent out in the post to random households it is likely that the people who didn’t respond would have poorer diets than those who did.
3) loss to follow up bias
4) prevalence/incidence bias (Neyman bias): when a study is investigating a condition that is characterised by early fatalities or silent cases. It results from missed cases being omitted from calculations (neyMan = Missed cases)
5) admission bias (Berkson’s bias): cases and controls in a hospital case control study are systematically different from one another because the combination of exposure to risk and occurrence of disease increases the likelihood of being admitted to the hospital (Berkson was admitted to hospital)
6) healthy worker effect

How well did you know this?

Not at all

Perfectly

Give 9 examples of Information Bias:

Detection bias: This can occur when exposure can influence diagnosis. For example women taking an oral contraceptive will have more frequent cervical smears than women who are not on the pill and so are more likely to have cervical cancer diagnosed (if they actually have it). Thus, in a case-control study that compared women with cervical cancer and a control group, at least part of any higher pill consumption rates amongst the former group may be due to this effect.

Recall bias: In retrospective studies where participants are asked to remember their past exposure to risk factors, it is likely that cases will have thought more about what factors in their past may have caused a disease than controls will have. Controls are therefore less likely to remember an exposure because they don’t link it to any disease process, which may skew the results.

Lead Time bias: Lead time is the period between early detection of disease and the time of its usual clinical presentation. When evaluating the effectiveness of the early detection and treatment of a condition, the lead time must be subtracted from the overall survival time of screened patients to avoid lead time bias. Otherwise early detection merely increases the duration of the patients’ awareness of their disease without reducing their mortality or morbidity. Numerous cancer screening procedures were thought to improve survival until lead time bias was addressed.

Interviewer/Observer bias: Interviewer or observer knowledge about in-question hypothesis and disease or/and exposure can take effect on collection and registry of data.

Verification and work-up bias: This is a type of bias in which the results of a diagnostic test affect whether the gold standard procedure is used to verify the test result. It is more likely to occur when a preliminary diagnostic test is negative because many gold standard tests can be invasive, expensive, and carry a higher risk.

Hawthorn effect: This can occur when participants alter their usual behaviour due to their awareness that they are being studied.

Ecological fallacy: This can occur when conclusions about individuals are based only on analyses of group data.

Expectation bias (Pygmalion effect): Only a problem in non-blinded trials. Observers may subconsciously measure or report data in a way that favours the expected study outcome.

Late-look bias: Gathering information at an inappropriate time e.g. studying a fatal disease many years later when some of the patients may have died already.

How well did you know this?

Not at all

Perfectly

What is Neyman bias?

Prevalence/incidence bias (Neyman bias): when a study is investigating a condition that is characterised by early fatalities or silent cases. It results from missed cases being omitted from calculations

How well did you know this?

Not at all

Perfectly

What is Berkson’s bias?

Admission bias (Berkson’s bias): cases and controls in a hospital case control study are systematically different from one another because the combination of exposure to risk and occurrence of disease increases the likelihood of being admitted to the hospital

How well did you know this?

Not at all

Perfectly

What type of bias?

Difference in the accuracy of the recollections retrieved by study participants, possibly due to whether they have disorder or not. E.g. a patient with lung cancer may search their memories more thoroughly for a history of asbestos exposure than someone in the control group. A particular problem in case-control studies.

Recall bias

In retrospective studies where participants are asked to remember their past exposure to risk factors, it is likely that cases will have thought more about what factors in their past may have caused a disease than controls will have. Controls are therefore less likely to remember an exposure because they don’t link it to any disease process, which may skew the results.

How well did you know this?

Not at all

Perfectly

What type of bias?

Failure to publish results from valid studies, often as they showed a negative or uninteresting result. Important in meta-analyses where studies showing negative results may be excluded.

Publication bias

How well did you know this?

Not at all

Perfectly

What type of bias?

In studies which compare new diagnostic tests with gold standard tests, this bias can be an issue. Sometimes clinicians may be reluctant to order the gold standard test unless the new test is positive, as the gold standard test may be invasive (e.g. tissue biopsy). This approach can seriously distort the results of a study, and alter values such as specificity and sensitivity. Sometimes work-up bias cannot be avoided, in these cases it must be adjusted for by the researchers.

Work-up (Verification) bias

How well did you know this?

Not at all

Perfectly

What type of bias?

Only a problem in non-blinded trials. Observers may subconsciously measure or report data in a way that favours the expected study outcome.

Pygmalion effect - Expectation bias

What type of bias?

Describes a group changing its behaviour due to the knowledge that it is being studied

Hawthorne Effect

What type of bias?

Gathering information at an inappropriate time e.g. studying a fatal disease many years later when some of the patients may have died already

Late-Look bias

What type of bias?

Occurs when subjects in different groups receive different treatment

Procedure bias

What type of bias?

Occurs when two tests for a disease are compared, the new test diagnoses the disease earlier, but there is no effect on the outcome of the disease

Lead-time bias

Lead time is the period between early detection of disease and the time of its usual clinical presentation. When evaluating the effectiveness of the early detection and treatment of a condition, the lead time must be subtracted from the overall survival time of screened patients to avoid lead time bias. Otherwise early detection merely increases the duration of the patients’ awareness of their disease without reducing their mortality or morbidity. Numerous cancer screening procedures were thought to improve survival until lead time bias was addressed.

What type of bias?

A form of bias that occurs when measurement of information differs among study groups examples include recall bias, reporting bias, diagnostic bias, and Hawthorne effect, errors in measurement

Information bias

What type of bias?

Distortion of exposure, disease relation by some other factor

Confounding bias

What type of bias?

This can occur when conclusions about individuals are based only on analyses of group data

Ecological Fallacy

What type of bias? This can occur when exposure can influence diagnosis. For example women taking an oral contraceptive will have more frequent cervical smears than women who are not on the pill and so are more likely to have cervical cancer diagnosed (if they actually have it). Thus, in a case-control study that compared women with cervical cancer and a control group, at least part of any higher pill consumption rates amongst the former group may be due to this effect

Detection bias

What type of bias? Articles of high citation are easy to reach and have higher chance to be entered into a given study.

Citation bias

What type of bias? This can occur when a treatment is studied in more severe forms of a disease. Such results may then not apply to mild forms of the disease.

Disease spectrum bias (aka case-mix bias)

What type of bias? Where the subjects are not representative of the population. This may be due to volunteer bias (aka referral bias). An example of volunteer bias would be a study looking at the prevalence of Chlamydia in the student population. Students who are at risk of Chlamydia may be more, or less, likely to participate in the study

Sampling bias

What is selection bias vs information bias? (Two main categories that the other subtypes fall under)

Selection bias - when selected sample is not a representative sample of reference population Information bias - when gathered information about exposure, outcome or both is not correct and there was an error in measurement

What type of bias? Interviewer or observer knowledge about in-question hypothesis and disease or/and exposure can take effect on collection and registry of data.

Interviewer/ observer bias

What name is given to a variable that is associated with both the outcome and the exposure but has no causative role.

Confounding factor Confounding can be addressed in the design and analysis stage of a study. The main method of controlling confounding in the analysis phase is stratification analysis. NOTE: It must relate to both the cause AND effect

What occurs when there is a non random distribution of risk factors in the populations. Age, sex and social class are common causes of confounding.

Confounding

What are the three main ways (in the design stage) to address confounding?

1) Matching (e.g. By age and gender) - an active form of control (NOT possible in case control or ecological studies. Small studies only.) 2) Randomization (which aims to produce an even amount of potential risk factors in two populations) 3) Restriction of participants (e.g. If watching TV is a known confounder then restrict participants to ones who don't watch TV)

What are the two main ways (in the analysis stage) to address confounding?

1) stratification - a statistical technique that allows to control for confounding by creating two or more categories (strata) in which the confounding variable either does not vary or does not vary very much. 2) multivariate models (e.g. logistic regression, linear regression, analysis of covariance (ANCOVA))

What are 7 types of randomisation (i.e helping reduce confounding at the design phase)

1) Simple Randomisation: Each participant has an equal chance of being assigned to any of the study groups. This method is straightforward but may lead to unequal group sizes in smaller studies. 2) Block Randomisation: Participants are divided into blocks, and within each block, they are randomly assigned to different groups. This ensures that group sizes remain balanced throughout the study. 3) Stratified Randomisation: Participants are first stratified based on specific characteristics (e.g., age, gender, disease severity) that are potential confounders. Randomisation then occurs within each stratum, ensuring these confounding variables are evenly distributed across groups. 4) Cluster Randomisation: Entire groups or clusters (e.g., schools, hospitals) are randomised rather than individual participants. This is useful when interventions are applied at the group level. 5) Adaptive Randomisation: The probability of assignment to a particular group changes based on accumulated data during the trial. This method aims to assign more participants to the more effective treatment but requires complex statistical approaches. 6) Minimisation: An adaptive method that allocates participants to groups in a way that minimizes imbalance in confounding variables. It considers the characteristics of participants already assigned when determining the group for a new participant. 7) Quasi: refers to ‘randomizing’ using even/odd numbers of the date of birth, day of the week patient was seen etc. These are not reproducible methods, and the sequences cannot ensure equal distribution of variables. These must be avoided. NOT a proper method!

What are 4 graphical methods used to detect publication bias? FP GP OFP NQP

Funnel plot Galbraith plot Ordered forest plot Normal Quantile plot

What is a funnel plot?

A funnel plot is a graph used to check for publication bias in systematic reviews and meta-analyses. They are a form of scatter graph that offers an easy visual way of making sure that the published literature is evenly weighted (drug companies have a habit of withholding data that doesn't support the product).

What do the x and y axes represent on a funnel plot? What do the dots represent?

The x-axis represents some measure of effect size (often a risk ratio) and the y-axis represents some measure of the study size (often the standard error). By convention, the y-axis represents a reverse of the standard error (zero at the top and standard error gets larger towards the bottom). Each dot on the funnel represents a differ trial in a meta-analysis. Larger trials tend to have smaller standard errors and are located towards the top and smaller studies, with larger standard errors, towards the bottom.

What should a funnel plot look like if there is no publication bias?

A pyramid, or symmetrical inverted funnel

What does an asymmetrical funnel plot indicate?

An asymmetrical funnel indicates a relationship between treatment effect and study size. This indicates either publication bias or a systematic difference between smaller and larger studies ('small study effects')

What is an Effect Modifier?

A third factor that affects the MAGNITUDE of the cause-effect relationship e.g family history