1A Epidemiology Flashcards
Right censoring
When people leave an at risk population before and event of interest has occured. Ie a cohort study where someone dies or is lost to follow up before the end of the study period but if they remained they MAY have developed the disease
Period prevelance
Proportion of population with an illness during a specified period of time
As a proportion it is a number between 0–>1. If you multiple by 100 you get a percentage
Frequently quoted in epidemiology as a number per population for the time period
(remember Incidence is a measure of the number of NEW cases of a characteristic that develop in a population in a specified time period; whereas prevalence is the proportion of a population who have a specific characteristic in a given time period, regardless of when they first developed the characteristic)
Cumulative incidence
Number of occurrences of interest over a time period (ie a year) NEW CASES
Note population is disease free a the start!
ie 4 cases of malaria in 1000 people over 1 year = 4/1000 = 0.4% over 1 year
Incidence rate (AKA incidence density)
Kind of more detailed than cumulative incidence. Takes into account when a person became ill or died/lost to follow up. Therefore it takes into account each persons time at risk for developing an illness. If they develop an illness, die or are lost to follow up they are no longer ‘at risk’. Reported as rate per X person-years
Note as in person years it IS NOT A PROPORTION so the formula for confidence intervals is different
Ratio data
Quantitative data with a true 0 and numbers can be compared by ratios ie height and weight, 0cm is truley and absent of height and 1m is double 50cm
Ordinal data
Data with an order but the interval between options is not always consistant ie likert scale or social class I,II,III,IV,V
Interval data
Quantitative data with a consistent interval between categories but no true 0 and the ratios between the data are meaningless (ie degrees celcius- 0 degrees does not mean there is no temperature (you can have minus numbers!) And 20 degrees is not twice as hot as 10 degrees
Nominal data
Categorical data with no order ie blood group
Dimensions of descriptive epidemiology
Time, person, place
Total period fertility rate
Sum of the age specific fertility rates. Indicates the average number of babies that would be born to a women during her lifetime if she had avergae fertility and survived to the end of her reproductive life.
Age specific fertility rate
(Number of births to women aged x/1000 women aged x) per year
General fertility rate
(Number of livebirths/ 1000 women aged 15-44years) per year
Crude birth rate
(Number of live births/1000 population) per year
Perinatal mortality rate
(Number of still births or deaths <7days/1000 births) per year
Post-neonatal mortality rate
(Number of deaths 4-52 weeks/1000 live births) per year
Neonatal mortality rate
(Number of deaths <28 days/1000 livebirths) per year
Infant mortality rate
(Number of deaths <1years/1000 live births) per year
Child mortaliry rate
(Number of deaths of children <5years/ 1000 <5years) per year
Age specific mortality rate
(Number of deaths for age x/1000 population aged x) per year
Crude mortality rate
(Number of deaths/ 1000 population) per year
Standardisation
Process by which data is transformed to allow comparison between populations with different demographics (ie different age structures)
Direct standardisation
Requires stratum specific data (ie age specific mortality rate). These rates are then applied to a standard population (ie the European standard population) (ie how many people in town A would have died if there were 60000 40-45 years olds not 40000). The ‘expected deaths’ for each age group are totalled and divided by the total number of people in the standard population. This is the age-standardised mortality rate. This has little meaning on its own but can be compared with another population standardised using the same reference population.
Needs data from large numbers to have accurate information for all strata
Comparative mortality ratio
age-standardised mortality rate town A/ age standardised mortality rate town B.
if ratio is for example 1.114 then after standardisation mortality is 11.4% higher in town A.
calculated for direct standardisation
weighted average
A weighted average is a method of computing an average where some data points contribute more than others. If all the weights of the data point are equal then the weighted average is the same as the mean.
ie when combing module assessment marks coursework is worth 60% and exam 40%. Your overall mark is a weighted average.
indirect standardisation
(when to use (3), what is the output and how to do)
Used when:
- strata specific rates are unknown
- study population is of a small size
- stratum specific rates are 0
Gives the STANDARDISED MORTALITY RATIO
Involves taking the rates for a standard population and calculating what would occur in the population of interest should the rates be the same (ie how many deaths would you expect in the population you are looking at). The expected rate can then be compared with the actual rate, using the standardised mortality rate.
standardised mortality ratio (SMR) AKA standardised Incidence ratio
Calculated when using indirect standardisation.
ratio of 2 counts
(observed deaths / expected deaths) x100
SMRs should be compared with caution as social class, ethnicity/sex composition will all have an impact. Furthermore the different age distributions of the populations make comparisons likely not valid.
If two SMRs are considered they would not be comparable even if
they use the same mortality distribution in the reference populations
* the reason is that the two expected values for expected deaths are constructed
using the different age distributions of the two study populations
* hence they are not referring to the same denominator
Standardised mortality ratio and occupation studies
IN occupational exposures SMR often underestimates the strength of an association as the general population contains both exposed and unexposed.
When doing occupational studies comparisons are made against 2 groups:
- an unexposed population from the same occupation
-general population
years of life lost
measure of premature mortality
death of young contributes more than death of older people. Can be calculated in 2 ways:
simple: upper age is chosen (ie 75years) and any deaths before that are counted and number of years lost summed.
Complex: life expectancy for each individual are calculated using life tables and age specific mortality rates.
Underestimates the burden of chronic disease as young people live with this for a long time.
HALE
health adjusted life expectancy.
Can be used instead of years of life lost, better for estimating the impact of chronic diseases where years of life lost may underestimate the disease burden (as it doesn’t account for feeling crap just dead)
HALE is a measure of population health that takes into account mortality and morbidity. It adjusts overall life expectancy by the amount of time lived in less than perfect health. This is calculated by subtracting from the life expectancy a figure which is the number of years lived with disability multiplied by a weighting to represent the effect of the disability.
If:
A = years lived healthily
B = years lived with disability
A+B = life expectancy
A+fB = healthy life expectancy, where f is a weighting to reflect disability level.
N.B. This raises many moral questions about who defines and measures disability level and how they do it.
event based measures of disease burden
Rely on routinely collected data on INCIDENCE
ie death certs, hospi admission data, disease registers, statutory notifications.
Any health service data will likely underestimate disease burden due to large proportion of self care.
Time based measures of disease burden
Where no routinely collected incidence data is available as cross sectional PREVELANCE survey may be used.
causes of variation in epidemiological studies
Chance (random error)
Bias
Confounding
True causal association
Reverse causation
Only after all other causes of variation have been considered should the possibility of a causal relationship be considered.
Bias
A systematic error that leads to a difference between the comparison groups with regard to how they are selected, treated, measured or interpreted.
Unlike confounding the role of bias cannot be measured.
Confounding
Where an apparent association between exposure and outcome is in fact due to a third factor
Reverse causation
when the outcome of interest leads to variation in the exposure
Sampling error
Sampling error is chance variation (as long as the study is unbiased) between the values obtained for the study sample and the values which would be obtained if measuring the whole population.
It is reflected in the standard error.
Standard error
Reveals how accurately a sample represents the whole population.
random measurement error (who does it effect and what impact does it have on the findings?)
effect both the exposed and non exposed groups
findings tend towards the null hypothesis
What does systematic error lead to?
leads to bias
bias can occur in either direction
ways to prevent/deal with measurement bias (6)
Measure reliability using correlation coefficients (cont. data) or cohens kappa (categorical data)
blind accurately
Use validated measuring tools and protocols
use a range of measures (direct measurements, questionnaires etc)
Conduct a sensitivity analysis
report potential errors both random and systematic
sensitivity analysis
Sensitivity analyses are used to determine the extent to which the results of a trial are affected by changes in method, models, values of unmeasured variables or assumptions
ie could analyse results with/without outliers, intention to treat or as treated
if changing things does not impact the results the results are likely more robust
risk
Same as cumulative incidence
Measures can be absolute or relative
Attributable risk
On formula sheet.
The difference between the rate of disease in the exposed and unexposed
AKA risk difference or excess risk
Attributable fraction
AKA the aetiological fraction
This is the proportion of the disease in the exposed which can be considered to be due to the exposure, after accounting for risk of disease that would have occurred anyway.
it is a measure that combines the risk difference and the prevelance
Population attributable fraction
The PROPORTION of the incidence of a disease in the population (exposed and nonexposed) that is due to exposure.
It is the proportion of a disease in the population that would be eliminated if exposure were eliminated
Population attributable risk
The excess RATE of disease in the whole population which is attributable to the exposure.
ie smoking and lung cancer mortality
mort. ion whole pop= 55 per 100 000
mort in non-smokers= 16 per 100 000
PAR= rate in pop.- rate in exposed= 39 deaths/100 000 per year
AKA preventable fraction
Effect measures
Effects measures are RELATIVE RISKS.
They do not give any idea about the absolute risk of an event but rather are a ratio of the probability of disease between the exposed and unexposed.
Give an indication of ‘strength of association’
They include risk ratio (same as relative risk) odds ratio and rate ratio)
risk ratio aka relative risk
risk of the disease in the exposed / risk of the disease in the unexposed
(a/a+b) / (c/c+d)
rate ratio
rate of the disease in the exposed/ rate of the disease in the unexposed
Odds ratio
used in case control studies where you have selected based on presence of disease so you cannot calculate risks.
You can calculate odds of diseased/not diseased having the exposure.
If the disease is rate OR approximates to RR.
odds of the disease in the exposed/odds of disease in the unexposed
(a/b)/(c/d)
Bradford hill Criteria (9)
Studies try revealing breakthoughs, check scientists are credible dorks
- Strength of association
- Temporal relationship
- Reversibility
- Biological plausibility
- Coherence (like biologic plausibility, the relationship should not conflict with the natural history of the disease)
- Specificity (exposure only causes one disease)
- Analogy (analogies to other cause and effect relationships)
- Consistency of findings
- Dose-gradient
Studies Try Revealing Breakthroughs, check scientists are credible dorks
Strength of association
Temporal relationship
Reversibility
Biological plausibility
Coherence
Specificity
Analogy
Consistency of findings
Dose response
2 broad types of bias
selection bias
Measurement bias
What is selection bias?
When there is a systematic difference between:
-study participants and non-participants
-those in one study group (ie intervention) and those in another group (ie control)
4 types of selection bias
-Healthy worker bias
-volunteer bias
- follow up bias
- control bias
Healthy worker bias
Problem in occupational cohort studies. People working tend to be healthier than the general population. For this reason cohort studies may use workers from the same workplace but different job role as controls
volunteer bias
people who volunteer for studies tend to be healthier and more compliant than the general population
Control bias
Particularly a problem in case-control studies. In these convenience sampling is often used, it cases are obtained from a hospital clinic list and controls from a different hospital clinic list. This may mean neither cases or controls are truly representative of the gen. population. This is improved by using nested case control studies. The case control study is nested in a cohort study. Exposure data is collected at baseline, if a case develops then controls are selected from the cohort. Data is only analysed from the cases and controls rather than the whole cohort.
Follow up bias
When those lost to follow up differ systematically from those who remain in the study
Measurement bias
When there are errors in the way outcomes or exposures are measured
Non differential (random) measurement bias
The error in assessing exposure/outcome occurs equally in both study and control groups. The misclassifcation is not related to outcome or exposure.
Serves to makes the groups appear more similar than reality.
3 main types of measurement bias
-Instrument bias
-Respondent bias
-Observer bias
Differential (systematic) measurement bias
classification error occurs differently depending on a persons outcome or exposure status.
Can serve to reduce or exaggerate a association between exposure and outcome.
Instrument bias
inaccuracies in equipment or test used to measure outcome/exposure
Responder bias (3 examples and ways to minimise)
Occurs when:
- exposure information given by respondents differs depending on their outcome
- outcome information given by respondents differs depending on their exposure
ie RECALL BIAS- a a particular problem in case-control studies
PLACEBO EFFECT- if an intervention has been received participants with report outcomes more favourably
can be minimised by- blinding, giving placebos, collecting exposure info from historical health records, using objective outcome/exposure measures
Observer bias (1 example and how to minimise)
systematic differences in the way exposure/outcome data is recorded between study groups
ie INTERVIEWER BIAS- interviewer may ask different questions if they know they have has an intervention.
Minimise through- blinding, standardised data collection protocol
General measures to reduce bias (10) mneumonic
BIRTH CREW D
BIRTH CREW D
general measures to reduce bias
B-Blinding
I-Irrelevant factors- collect irrelevant factors to measure bias and blind hypothesis under investigation
R- repeated measurements to reduce instrument bias
Training
High risk cohort- select cohorts at high risk of disease to reduce follow up time and therefore follow up losses
C-choice of controls, use hospitalised controls o increase comparability (they will have a similar level of recall of events prior to admission)
Randomisation
E-ease of follow up- chose cohorts who are easy to follow up
W- Written protocol
D- Duplicate measures- get information of exposure/outcome from multiple sources
3 measures to reduce bias in questionaires
- check for known associations
- seek information in different ways
- check characteristics of data collection (ie time to complete survey)
4 ways to measure bias in intervention studies
- self report
- pill counts
- measuring biochemical parameters
- incorporating safe biochemical marker in placebo that can be measured in urine
Mediating factors
confounding factors have to be independantly associated with both exposure and outcome.
Mediating factors are a step along the causal chain.
ie poor diet and CHD are associated. High cholesterol is a mediating factor
Poor diet –> high chol –> CHD
positive confounding
makes an association more pronouced
negative confounding
makes an association less pronouced
Residual confounding
When unknown confounding factors have not been accounted for or when confounders have been inaccurately measured.
Essentially eliminated by randomisation as randomly distributed between the 2 groups.
effect modification
An effect modifier is a third variable with effects the strength of association between the exposure and the outcome.
ie smoking and asbestos exposure –> lung cancer. Smokers with chronic asbestos exposure have far greater risk of lung cancer than the 2 risk added together, asbestos exposure is an effect modifier.
Assessing effect modification
Stratified analysis provides a way to identify effect modification (ie look at strength of association for each level of the effect modifier)
A chi squared test for interaction can be used to assess whether the difference between strata specific estimates are likely due to effect modification or chance, however the test has low power so estimates should also be checked visually.
2 stages of a study that confounding can be addressed
Design and analysis
3 strategies for dealing with confounding at the design stage
Randomisation- deals with both known and unknown confounders if large enough sample but not always possible
Restriction- ie if sex and race known to confound just use black women. Cheap but restricts pool or participants, residual confounders remain if restriction is insufficiently narrow
Matching-Really only used in case-control studies as is difficult and expensive. Cannot assess impact of factors that have been matched. No control over factors which have not been matched.
3 strategies for dealing with confounding at analysis stage
- stratification- divide study population into groups according to the confounder so that within groups the confounder cannot confound as it does not vary. After stratification the mantel-haenszel estimator can be employed to provide an adjusted result according to strata and a combined weighted average. If it differs from the crude estimate of effect strength confounding is at play. Can only deal with a small number of confounders as the number of strata increases exponentially and therefore the number in each group decreases.
Standardisation - ie direct and indirect
Multivariate analysis - ie multiple regression or logistic regression. Can deal with multiple confounders
2 examples of descriptive studies
case reports and case series
strengths (3) and weaknesses (4) of descriptive studies
Strengths: Cheap, rapid, can support hypothesis generation
Weaknesses: No control group, cannot test for valid statistical association, may not be generalisable, cannot assess for disease burden
Ecological studies what are they and the 2 main types
characterised by the unit of observation being a group.
Describe a pattern of disease for an entire population with regards to another parameter.
Measures correlation coefficient
2 main types:
geographical studies
time series study
Strengths (4) and weaknesses of ecological studies (6)
Strengths: Rapid, cheap, can use routine collected data, support hypothesis generation
Weaknesses: Ecological fallacy, no individual level data, spatial autocorrelation (2 places close together are likely to be more similar than 2 places far apart- analysis assumes places are independent but they may not be), leakage of exposures through migration, assesses average exposure (would not be able to detect a J shaped curve), unable to control for unknown confounders
Ecological fallacy
When inferences about individuals are drawn from population level data from ecological studies.
ie a study showed that USA states with high levels of immigrants had higher literacy levels. People deduced that migrants had high levels of literacy. Actually migrants were more likely to move to states with high literacy levels but their literacy levels were low.
Cross sectional studies (design, sampling, application, analysis, strengths, weaknesses)
Information on exposure and outcome are collected at a single timepoint.
Can be descriptive (ie data collected on exposure or outcome), analytical (assess association between exposure and outcome) or ecological (no individual level data
DESIGN- data collected on exposure and outcome at a signle time point
SAMPLING- needs to be representative of population under study, should be random and sufficiently large
APPLICATION- hypothesis formulation or hypothesis testing if analytic
ANALYSIS: disease frequency: prevalence or odds
measure of effect: OR, prevalence ratio or prevalence difference
STRENGTHS- rapid and cheap, useful for rare diseases, can study multiple exposures and outcomes, useful for assessing disease burden
WEAKNESSES- as only assesses prevalence not incidence cannot distinguish between determinants of aeitiology and survival, hard to establish temporality (risk of reverse causation), ris, of recall bias
case-control (design, sampling, application, analysis, strengths, weaknesses)
Design: cases identified and matches with controls. Ideally cases should be 1:1 with controls but can be 1:4 if cases are limited. More controls than this add little to study power.
Sampling: can be population based or hospital based but as hospital population is not always representative of the general population, population based is better
Application: can be used to test hypothesis. Can be retrospective (all cases are identified before study starts) or prospective (new cases are identified during the study period).
Analysis: calculate OR. Cannot calculate disease prevalence
Strengths: cheap, can be rapid, good for rare diseases, useful for diseases with long latent periods, can examine many exposures simultaneously
Weaknesses: selection bias (control bias) since exposure and disease have already occurred, recall bias, temporal relationships may be difficult to establish, poor for rare exposures.
nested case control study
Nested within a cohort study. Cases and controls are selected from the cohort and the data collected utilised.
Limits selection (control) bias as cases and controls are drawn from the cohort. Cost effective and can avoid recall bias by using previously obtained information.
Cohort study (Design, sampling, application, analysis, strengths, weaknesess)
Design: participants identified based on exposure. Can be retrospective (exposure and outcome assessed from case notes) or prospective (normal)
Sampling: Population based sampling generally better especially if common exposure. If exposure rare cohort may be chosen from specific group (ie builders for asbestos exposure) but note if using workplace the risk of the healthy worker effect)
Application: able to measure incidence in both groups
Analysis: Relative- risk/rate/odds ratio
Absolute: risk/ rate/ odds difference. Most assess group similarity to assess for confounding. Can assess lost to follow up by considering 2 extreme scenarios- all those lost develop disease or all those lost do not develop disease.
Strengths: Can establish temporal relationship, good for rare exposures, can look for multiple outcomes from one exposure, minimises selection bias, retrospective cohort studies are useful for diseases with long latent periods
weaknesses: Expensive, time consuming, healthy worker effect, bad for rare diseases, risk of loss to follow up, records may be incomplete for retrospective cohort studies
intervention studies (design, sampling, application, analysis, strengths, weaknesses)
Design: investigator determines which participants receive an exposure. Can be affected by non-compliance (can be improved by having a run in period pre randomisation to assess and improve acceptability of treatment/placebo), if non -compliance is an issue results will tend towards the null hypothesis.
SAMPLING: Sample needs to represent the reference population
APPLICATION: can investigate therapeutic or preventative interventions at individual or group level
STRENGTHS: Can provide high quality evidence, if sample large enough validity largely guaranteed, blinding can minimise observation bias, randomisation can eliminate residual confounding
WEAKNESSES: expensive, ethics (need to have clinical equipoise), does not test treatments in real world scenario, can be difficult to generalise to general population
Crossover RCT
each participant acts as their own control, they receive 2 or more treatments during the study period
Factorial RCT
Compares 2 or more interventions alone and in combination (ie drug A, drug B, drug A+B or placebo). Needs alot of participants
Cluster RCT
when groups are randomized not individuals.
Challenges with small area analysis
- there may be little variation in exposure between areas making analytical studies difficult
- chance/incorrect data may have a greater effect on results
- there may be a lack of data
why do small area analysis and an example
Some diseases may be significantly high in some small areas and this will be lost in larger area averages. Having high quality local data can therefore be beneficial.
Dartmouth atlas of healthcare looks at medical supply and utilisation across areas of the US and examines variation
definition: Validity
How well an instrument measures what it intents to measure
Ways to assess/describe validity (4)
- criterion validity
- Face validity
- Content validity
- Construct validity
What is criterion validity
there are 2 types of criterion validity
1. Concurrent validity- how well an instrument compares to a gold standard
2. Predictive validity: how well an instrument predicts what it aims to ie risk of developing disease
What is face validity
How well and instrument compares to expert opinion