Case Control Studies Flashcards
Definition of case control studies
People with disease (cases) are compared to people without the disease (controls) and past exposures are measured
Definition of odds ratio
Diseased and exposed : disease and unexposed
Definition of power
Probability of detecting true effect and not finding a FN/Type II error
Definition of non differential information bias
Errors are distributed evenly between cases and controls
Definition of differential information bias
Difference in follow up completeness between groups
Definition of selection bias
Population used as control must be representative pf the general population
Definition of admission bias
Exposed cases have a different chance of admission than controls. Exposed cases in the study are not representative of all exposed cases
Definition of diagnostic bias
Diagnostic approach related to knowing exposure status
Definition of survival bias
Only survivors of a study are analysed
Definition of non response bias
Controls don’t respond => large difference between those who responded and those who didn’t
Definition of recall bias
Cases remember exposure differently to controls
Definition of interviewer bias
Different questions/questioning styles used by interviewers
Definition of confounding
Alternative explanations for observed exposure outcome association due to another exposure
Definition of population stratification
Presence of systematic differences in allele frequencies between subpopulations in a population
Definition of statistical interaction/effect modification
Association between exposures and outcomes differ according to a 3rd factor
When are case control studies most often used
GWAS
Name the 8 Bradford Hill criteria for causation
What are they?
Strength of association
-Stronger the relationship between IV and DV => increased credibility and less likely to be due to confounding
Consistency (reproducibility)
-Consistency of results in different studies
Specificity
-Causation likely if there is no other explanation
Temporality
-Does cause always precede consequence?
Dose response
-Does increased IV => increased DV
Biological plausibility
-Does it make sense with existing biological knowledge
Coherence
-Compatibility with existing knowledge
Experimental evidence
-If IV altered => does it lead to the corresponding disease outcome
Analogy
-Results due to chance/bias/confounders
Why are reports of associations between genotype and outcome so often inconsistent
5 reasons
Variation of underlying association between genotype and outcome between populations
Heterogenous phenotypes
Confounding by population stratification
Failure to exclude chance as an explanation
Publication bias
What are case control studies
People with disease (cases)
People without disease (controls)
Measure past exposure for both via genes and compare prevalence of exposure in both groups
What are the 3 advantages of case control studies
Inexpensive and quick
Good for rare outcomes and multiple risk factors
Can look at risk factors in detail
What are the 4 disadvantages of case control studies
Not good for rare exposures
Selection bias
Recall bias
No estimate for diseases incidence
In what way must the cases and controls be similar
Describe the ratios between controls and cases
The sample population of controls must be similar to the cases
By increasing the ratio of controls
- increased power
- decreased p
- increased 95% CI
- no chance in OR
By doing so, can compare if controls and cases cary on different risk factors
What are the 6 key features of a case control study
Start with disease/outcome
Retrospective, info obtained from past/is lifelong (genotype)
Can be prospective but takes longer to complete
Observational
No follow ups needed
Suitable for rare diseases, all accessible cases can be located, controls can be found afterwards
How are cases selected
What are the 4 possible types of cases
Strict diagnostic criteria
- specificity of disease
- consider diagnostic bias and validity of diagnosis
Population based cases
-Include all patients/random sample of all subjects with disease at 1 point/during timeframe
Hospital based cases
-All patients in hospital dept at 1 point in time
Incident cases
Prevalent cases
Describe the importance of controls
2 factors
Study base
-characterise distribution of exposure that is representative and random
Comparable accuracy
-equal reliability in all info obtained => no systemic misclassification
What are the 6 sources of control
Which sources of control should you be careful with
Hospital patients
Population of defined geographical area
Probability sample of total population
Neighbors Friends (watch out for similar exposure characteristics to cases) Relatives (watch out for similar genetics to cases)
How would you calculate the odds ratio
How would you interpret the odds ratio
AD / BC = disease and exposed:disease and unexposed
OR = not 1, p<0.05
- greater than 1 => increased risk
- less than 1 => protective factor
95% CI contains 1 => can’t reject null
No difference between exposed and non exposed population
What is the power
Probability of detecting true effect and not finding a FN/Type II error
Describe the use of p values in GWAS
Includes 1000s of comparisons
Generally p = 5x10-8
Descibe analysis of a rare disease
-what would you calculate
Risk ratio, rate ratio, odds ratio are numerically similar
Can be used interchangeably
Describe 2 methods of data collection for the event and exposure
Event External data sources -Disease registries -Death certificates -Hospital records
Internal data sources
- Questionnaires
- Physical exams
- Blood and diagnostic tests
Exposure
External data sources
-Hospital records
-Employers
Internal data sources
- Questionnaires
- Physical exams
- Blood and diagnostic tests
What are the 3 main types of bias
Selection bias
Information bias
Confounders
What are 3 sources of misclassfication bias
What are the 2 types of misclassification bias
Sources
- disease status
- determining exposure status
- confounders
Types
- Non differential
- Differential
What is selection bias and the effect on the odds ratio
The population used as the control must be representative of the general population
If not => underestimation of OR
What is admission bias and the effect on the odds ratio
Exposed cases have a different chance of admission than controls
Not representative of all cases => overestimation of OR
What is diagnostic bias and the effect on the odds ratio
Diagnostic approach related to knowing exposure status
More likely to be diagnosed => not representative of all cases => overestimation of OR
What is survival bias and the effects on the odds ratio
Only survivors of a study are analysed
Contact with risk factor => rapid death
Leads to underestimation of OR
What is non response bias and the effects on the odds ratio
Controls don’t respond => large difference between those who responded and those who didn’t
Leads to underestimation of OR
What is recall bias and the effects on the odds ratio
Cases remember exposures differently to controls
Leads to overestimation of OR
What is interviewer bias and the effects on the odds ratio
Different questions and questioning styles used by interviewers
May ask more leading questions to cases
Leads to overestimation of OR
What are the 4 characteristics of both non differential and differential misclassification
What are examples of both types of misclassification
Non differential
- Random error
- Unrelated to exposure or outcome
- Not a bias
- Weakens measure of association
Use of technology (calibration)
Poor quality controls in DNA processing
Differential
- Systematic error
- Related to exposure/outcome
- Results in bias
- Measure of association distorted in any direction
Collecting diff types of DNA between cases and controls
Diff technologies used to sequence case and control DNA
How would increasing the study size affect the size of random and systematic error
Random
-As study size increases => decrease in error
Systematic
-As study size increases => no change in error
What are 4 ways to reduce bias present in a case control study
Carefully consider your
- choice of study population
- methods of data
- sources of exposure and disease info
- assess extent and direction of bias
What are confounders
What is the effect of confounding
Alternative explanations for observed exposure outcome association due to another exposure
Causes bias in estimate
What are the 2 methods of addressing confounders
Design
Collect sufficient data and either
-restrict study to specific populations
-matching controls and cases on confounders
However, its not always possible to match/restrict, there will always be some residual confounding
Analysis
Population stratification could be present in genetic studies
-adjust for confounders with modeling of logistic regression
What are the 3 criteria for confounding
Must be causally/non causally associated with exposure in source population in study
Must be a causal risk facto for disease in unexposed
Must not be on causal pathway between exposure and pathway
What are the 3 common confounders
Age
Sex
Socioeconomic status
What is population stratification
Presence of systematic differences in allele frequencies between subpopulations in a population
Is not always obvious, but can lead to FP and false associations
Dealt with by analytics programs
What is statistical interaction/effect modification
Association between exposures and outcomes differ according to a third factor
Variation in groups/strata not due to chance
Normally need to report stratum specific rate ratios
How would you interpret statistical interaction
Is this common in statistical studies
Rate ratio cary per stratum
If rate ratio varies according to a factor, there is an interaction
Can test for this
True interaction rare in genetic studies
How can you deal with bias in a study that focuses on genetic risk factors
Bias in exposure (genotype) eliminated in correctly designed studies
Selection bias not likely to be an issue unless population stratification present
Confounders
- population stratification
- linkage disequilibrium
How can you measure risk factors in a study that focuses on genetic risk factors
Can measure all risk factors by comparing all loci in a genome
How can you use data gained from studies that focus on genetic risk factors
Data collected can be banked and shared indefinitely
Records must be anonymized so consent only needs to be gained once
How do you measure exposure in studies that focus on genetic risk factors
Genotype can be measured retrospectively via case control
How can you deal with bias in a study that focuses in environmental risk factors
Bias in different exposures can cause problems
Selection bias is an issue mainly in case controls
Confounders can’t be adequately controlled
How can you measure risk factors in a study that focuses on environmental risk factors
Unlimited risk factors
How can you use data gained from studies that focus on environmental risk factors
Studies limited to baseline exposures, sharing data => active collab
Not always necessary to break link between records and ID after data collected
How do you measure exposure in studies that focus on environmental risk factors
Exposure can be affected by disease onset so prospective studies needed