Basic Epi Flashcards
Descriptive study definition including study types
Measure the occurrence of outcomes.
Can be split into either populations or individuals.
Individuals - case reports, case studies, case series, surveillance and prevalence cross-sectional studies
Populations - Ecological studies
Analytical study definition
Test the association between exposure and outcome
How can you measure the distribution of disease?
Time - year, season, day, hour
Place - country, region, district
Person - age, sex, social class, lifestyle
Four commonly used sources of data
Routine statistics
Population censuses
Surveys
Special studies
5 routine statistic sources
Death certificates
Birth records
Special disease registers - cancer registries
Communicable disease reports
GP records
What is an ecological study? (at least 4)
An observational design study (no treatment)
Use routinely collected data
Based on groups - not individuals (group is unit of observation), not possible to link exposure to his/her outcome
Uses correlation coefficient (r)
Useful for generating hypotheses, not useful for true exposure risk at individual level.
Can be useful at looking at group level disease e.g. schools
An example could be air pollution and mean bmi of an area
Why use an ecological study?
to investigate aetiology and risk factors for disease or evaluate changes in health care policy
generate hypotheses
estimate prevalence
What is the definition of ecological fallacy?
Associations at population level do not imply association at an individual level
What are the strengths of ecological studies? 8 marks
Quick and cheap
Use available data e.g. routine stats
Some factors operate at population level e.g. air pollution
Some exposure data only available at population level
Differences in exposure between areas may be larger than those between individuals in one area
Ability to map ecological data
Can generate hypotheses
Random errors may be smaller for populations than individual exposures
What are the weaknesses of ecological studies?
Data may be collected or recorded differently in different places
Surrogate measures based on the average of the population
Spatial boundaries are artificial
Confounding (lack of data)
Could use proxy measures
Classification challenges
Ecological fallacy
Uncertainty in temporal relationship
Collinearity in variables (i.e. your variables are too similar)
What is a cross-sectional study? (8)
An observational study
Carried out at a single point in time
Snapshot of population health
Collects individual data
Can measure prevalence, not incidence
Cannot prove cause and effect
Good at generating hypotheses
Can be descriptive or analytical (descriptive will describe the data, analytical will investigate risks factors and outcomes, collecting data on outcomes and exposures at the same time)
Typically use surveys to gain data
Examples of national CS studies
Surveys - census, national survey for Wales, national attitudes and lifestyle survey
CS design
Population has a representative sample (if not using whole pop).
Descriptive:
Then you need a number with disease or exposure and then a number without disease or exposure.
This will allow calculation of prevalence of disease
Analytical:
Will need number with/without disease and then with/without outcome, so 4 sample groups vs 2 groups.
CS Study and temporality
CS study - difficult to measure temporality, chicken and egg scenario. This temporality issue depends on the exposure e.g. a genetic factor does not vary over time whereas exercise levels will change
Strengths of CS studies
Quick
Cheap and simple
Good for chronic diseases
Data on individuals e.g. questionnaires
Can estimate prevalence
Can assess many outcomes and risk factors
Can generate hypotheses
Weaknesses of CS studies
No good for acute diseases
No good for rare diseases
Prone to bias
Need high participation rates to be valid
Only a snapshot
Cannot infer temporal or causal relationships
Bias definition
Bias is a consequence of defects in the study design or execution of a study and cannot be controlled for by statistical measures and often cannot be mitigated by increasing sample size.
It is any systematic error in an epidemiological study that results in an incorrect estimate of the association between exposure and outcome
Name two types of bias
Selection bias (differences between groups) in how study subjects are chosen or respond
Information bias (difference between groups) in the accuracy of data on exposure/outcome
CS studies key metholodigical concerns
Validity and repeatability
Response rates (non-response)
Sampling - how representative of the true population is the sample? (sample size calls done?)
Association is NOT the same as causation
Definition of confounding
A spurious relationship between exposure and outcome due to the presence of another variable which is associated with both the exposure and the outcome
How to control confounding?
By study design methods:
- randomisation (only in intervention designs)
- restriction (limit entrance to those within specific categories (e.g. age group)
- matching e.g. age or sex
By analysis methods
- stratified analysis
- multivariate analysis (can control simultaneously for several factors)
The goal of statistical analysis in the context of sampling is…
If we wish to say something about an attribute of the population, we take a sample so we can make inferences back to the population
Sampling in observational vs intervention
In observational you sample participants to observe
In intervention you allocate participants to observe
Why do we sample?
Not possible to collect info from all subjects
Sample can provide reliable info on the population by:
- estimating important population parameters (means, proportions etc.)
- to infer toward the populations (using valid distributional assumptions)
- present inferences using estimates, confidence intervals, hypothesis tests (p values)
We do this to potentially minimise bias
What is non-random sampling?
Non-probabilistic
Judgement (purposive) - selection based on personal (expert) belief about representativeness
Accessibility (convenience) - select the most easily obtained participants
Quota - judgement and accessibility used to achieve specifies group sizes
It does not involve random selection
May or may not represent the population well
Used when researcher lacks a sampling frame for the population
These methods may introduce bias
What is judgement or purposive sampling?
Sample selected based on (subjective) judgement of researcher
This is often used
It is a type of non-probability sampling
You are not attempting to be generalisable
Potential for bias is high
Examples of non-random sampling
Quota sampling - people on the street, widely used in market research, strata corresponding to different characteristics e.g. age, sx, race
No sampling frame
Problem -> stratum may not be random
Pro and con of non-random sampling
Con - selection units up to investigator - high bias
Pro - convenient, less expensive than probability sampling
Random sampling is…
Selection of participants from the population is random. It is any method that utilises some form of random selection.
Different units in the population should have a known probability of being chosen.
AKA probability sampling
Removes the possibility of bias
Sampling unit definition
The elements being sampled (e.g. the postcode rather than the individual)
Sampling frame definition
the list of all units able to be sampled (the list of all postcodes included)
Sample definition
Collection of sampling units drawn from the sampling frame
Sources of error in sampling
Estimate = true value + random error + systematic error (bias)
Sample size (n)
Sampling design
Non-response (unit or item)
Measurement:
- observer or interviewer
- participant or respondent
- instrument e.g. questionnaire
- mode of administration
Does bias effect accuracy or precision?
Accuracy - accuracy is crudely defined as the difference between the population value and sample estimate
Precision definition
How closely repeated measurements or observations come to duplicating measured or observed values
Ways of sampling (replacement)
With replacement - the chance of selecting the each unit does not change from selection to selection. The same unit could be selected more than once.
Without replacement - the chance of selecting each unit does change from selection. under this strategy the same unit may not be selected more than once.
Simple random sampling
gold standard
everyone has equal property
n/N could be 100/100,000 therefore 1/1000 probability to be picked.
This is sampling without replacement
True SRS can be expensive but has low bias, might be inefficient to do as well
How to do simple random sampling
- obtain a sampling frame that includes all units
- number each unit 1-N
- generate n random numbers between 1 and N
- if any random number selected is greater than N or already selected ignore it
- select each unit corresponding to the random numbers
Systematic sampling process
How does it compare to simple? (1)
may be more convenient than simple
process
1. put numbers in a sequence
2. divide the study population size by required sample size to determine interval size
3. choose a random number between 1 and k
4. start at random number and add part 2 each time
e.g. n/10,000 - 200/10,000
pick random number for k, 200/k
if k = 27, 27+50 = 77, 77+50 = 127
Stratified random sampling
We may want independent results for different sub-population e.g. sex, age or location
Simple random sampling are taken for each strata
Advantages of stratified sampling
increases the chance that sample mean is a precise estimate (e.g. less random error) of the population mean because the sample resembles the study population better
useful if interested in sub-pops
strat samples useful when interested in comparing groups
can use different sample designs if different strata
administrative convenience
cons of stratified sampling
size of the strata in the pop must be known
information on strata characterstics must be known in advance
a weighted analysis is required if not proportionate to the population distribution of strata
cluster sampling
populations may consist of large number of groups e.g. households, schools or hospitals, these can be referred to as clusters
we first take a simple random sample of n clusters and include all sample members of the chosen clusters.
the sampling unit is the cluster not the individual essentially SRS applied to groups
cons of cluster sampling
hard to define cluster
less efficient than SRS
members of the same cluster tend to be more similar
Samples might not be independent:
- tends to be less precise
- requires a large sample than srs
pros of cluster sampling
often cheap and convenient to draw sample
only for admin convenience
often we do not have info of the individuals only a group e.g. address of household
two-stage sampling process
- draw up list of all first stage units (called primary sampling units)
- select random sample of PSUs
- for each PSU selected draw up a sampling frame of second-stage units
- select a random sample of the second stage units (SSUs)
Pros and cons of two-stage sampling (4)
can reduce cost if collecting data in person
usually less precise, have larger SEs
can be extended to 3 or more stages
req. specially weighted analysis
postal questionnaire as a mode of data collection pros and cons
relatively cheap and easy, able to cover wide geography, larger
poor response rate, can be completed by not the respondent, not possible in some settings, may appear probabilistic - but actually not
by enumerator mode of data collection - pro and con
high response rate, often greater details given, able to take other non-recorded data e.g. samples
cons - v costly, enumerators can influence and bias responses, hard to over come geographical differences
by the internet, mode of data collection - pro and con
very cheap and easy to collect
usually biased, non-probabilistic, difficult to analyse
summary of sampling design
important to ensure good estimates of population parameters as they reduce bias (systematic error) and measure precision (standard errors). random sample reduces chance of bias. not always possible to do SRS so stratified, cluster and two stage is req. more complex designs tend to increase sample size
are standardised death rates adjusted or unadjusted?
Adjusted by age
What methods of standardising are there?
Direct and indirect
What is indirect standardisation?
Age-specific death rates from a standard population are applied to the index population - provides a standardised mortality ratio and indirectly standardised rates. It does not require age specific mortality rates of the index population.
what is direct standardisation?
age specific rates are taken from the population that we are standardising and applied to a standard population.
Provides directly standardised rates. Age specific-rates are a necessity.
SMR equation
Total number of observed deaths / total number of expected deaths x 100 = SMR
Steps in indirect standardisation
- apply age-specific rates from the standard population and calculate the expected number of deaths
- sum up number of expected deaths
- divide the observed deaths in the index population by the number of expected deaths
How to interpret SMR
can only compare the index population with the standard population.
An SMR of 100 = similar expected rates
>100 = larger mortality rate
<100 = lower mortality rate
Should include a 95% CI
Direct standardisation method
- Calculate local population death rates for each category
- apply the age specific mortality rates from the index population to the standard population to calculate expected number of deaths
- sum up expected number of deaths for the index population using the standard population age groups
- calculate overall age standardised mortality rates for the index
Which standardised mortality ratio method is best?
- the direct method requires availability of age-specific rates for the study population of interest
- the indirect method requires the total number of cases in each population
- if numbers are small the indirect method is advisable
demography definition
the study of characteristics of human populations
Infant mortality definitions
- Stillbirth – born after 24 or more weeks completed
gestation and which did not, at any time, breathe or
show signs of life - Perinatal – stillbirths plus early neonatal deaths
- Early neonatal: deaths under seven days
- Neonatal – deaths at under 28 days
- Postneonatal – deaths between 28 days & one year
- Infant – deaths under one year
- Rates – neonatal, postneonatal and infant mortality
rates are reported per 1,000 live births
What are case control studies useful for?
Outbreak investigations
Studying uncommon diseases
Studying diseases with long latency
Generating and testing hypothesis
Quick and cheap to conduct
Do case control cases start with people with the disease or without the disease?
With the disease
How do you identify a case for a case control study? (6)
Need a precise case definition - must be objective with a validated test e.g. blood test, histology, X-rays or sonograms
It can be classified by diagnostic certainty e.g. confirmed, probable or possible
Whilst precise definitions can limit generalisability, it increases validity
Often req. inclusion/exclusion criteria (who, what, when, where)
Must report on numbers that meet exclusion/inclusion
Must decide whether incident or prevalent cases (i.e. all new cases within fixed time period, or taking all new and old ones)
Why would you use incident cases for a case-control study? (3)
Useful if exposure is associated with recovery or survival
Greater representativeness at its timely
Disease behaviour change less likely
What are the pros and cons of using existing cases for a case control study? 1 and 3
Pro - can be used when its difficult to establish date of onset e.g. study of H.pylori infection
Con - may not be representative of all cases, pts with long course disease tend to be over represented and recall bias
Case control studies - population vs hospital cases
Population based studies try and recruit all from a defined population over a fixed period of time. It requires tracing subjects and has issues with completeness and refusal to participate.
Hospital based studies require a clear case definition and protocol adherent to minimise bias as they are more prone to bias (note this won’t score a star in some systems)
How would you recruit controls to a case control study? What characteristics must they have? (5)
Controls should be free of disease and representative of the population.
They must have the potential to become diseased.
They must be from the:
- same source population
- same inclusion/exclusion criteria
- identified as cases if they had the disease while under investigation
What options are there for selecting controls? i.e. what types of groups of controls can be used
Population controls (random from registry/list/directory)
Neighbourhood controls
Friends/family controls
Hospital controls
Pros and cons of using population controls
Can gain a random sample
Limitations: can be difficult and time consuming, healthy people may not participate, may be low response rate, selection bias (e.g. if using a register with phone numbers some may not have phone)
Neighbourhood controls pros and cons
Pro
No need for population register
Controls for social factors e.g. deprivation
Con
Poor cooperation
Can be time inefficient
Friend and family control group pro and con
Pro - no need for pop registers, quick and efficient, easily to control for social factors so may reduce confounding
Con - overmatching, selection bias (i.e. family members may be different from gen pop)
Hospital controls pro and con
Pro - easy to identify, relatively cooperative
Cons - overmatching (more likely to be sicker and have higher risk factor exposures), selection bias (catchment population differs for different diseases)
What may affect your case:control ratio?
If ample cases can be 1:1 for sufficient power
If rare outcomes can go 1:1 - 1:4 , >1:4 is a waste of resources as minimal increase in statistical power
How can we control for confounding in case control studies?
Study design methods
- randomisation (can’t do this one)
- restriction (can be done)
- matching (can be done, risk of over-matching)
Analysis methods
- stratification
- multivariable analysis
What is overmatching?
Matching is too close or elaborate. Controls become difficult to find and may fail to find a true causal association or may underestimate the association.
Odds ratio vs relative risk
Odds ratio is simply an association, with no indication of temporality. Incidence of disease is unknown. Relative risk is usually interpreted as the risk of having outcome of interest if you have the exposure i.e. if temporality is indicated. Odds tend to approximate OR. If the prevalence of the disease is <10%, the relative risk and OR can approximate each other.
What types of biases are case control studies known for?
measurement bias (observer, responder and limitations of instruments) and selection bias (selection of cases and controls)
Case control strengths
Good for study for rare diseases
Can use small sample size
Makes use of available data
Rapid
Low cost
Suitable for diseases with long latency
Can examine multiple exposures for a single disease
Case control studies limitations
Cannot directly measure relative risk
Not suitable for rare exposure
Temporal relationship exposure-disease difficult to establish
Prone to several biases - selection of controls, recall when collecting data
Loss of precision due to sampling
what is a cohort study?
an observational study which follows up two or more groups of people from exposure to outcome. A simple cohort study has an exposed or unexposed group. Participants that develop the outcome are recorded and rates in the two groups compared.
members of a cohort study must be…
free of disease at the start of the study and at risk of the outcome being studied.
for common exposures, cohort may be drawn from gen pop. for rare exposures, a specific group may be more suitable
Name two types of cohort studies and what they are
prospective cohort studies - start before outcome has occurred. cohort is disease free at the start of follow up. data is captured prospectively
retrospective (historical) cohort studies - start after outcome has occurred. relies on medical records or routinely collected data. data on measurements, exposure and outcome are collected retrospectively after the events have happened
risk definition
the probability that an event will occur
when is risk usually used as a method of association?
in RCTs and cohort studies
relative risk definition
the risk of a particular outcome (Disease) when a particular exposure (risk factor) is present
what is easier for people to understand, relative risk? or absolute risk?
absolute risk
what is absolute risk denoted as?
I(subscore)e
how is attributable risk measured?
Incidence in p - incidence in unexposed / incidence in the whole population
What are population measures?
To assess the extra disease incidence in the whole study population that can be attributed to the exposure we can use measures of population impact
Relative risk measures summary 3 things
measures strength of association between exposure and outcome
can be generalised to other populations
also known as risk ratio and rate ratio
absolute risk measures summary 3 things
measure the impact of an association i.e. number of cases that could be prevented by eliminating the exposure
relies on baseline incidence in the unexposed which vary in different populations, therefore cannot be generalised
also known as excess risk, risk difference, rate difference, attributable risk
why us a cohort study?
to measure incidence
find aetiology
quantify risk factors
describe prognosis
evaluate treatment outcome
strengths of cohort study
can directly measure: incidence in exposed and unexposed groups, true relative risk
temporal relationship between exposure and disease is clear
can examine multiple effects for a single exposure
less prone to selection biases (outcome not known - prospective)
weaknesses of cohort studies
require large sample size
not suitable for rare disease
not suitable for disease with long latency
problems with losses to follow up
can be difficult to measure multiple exposure
exposures may change over time
time-consuming and costly
when is hazard ratio used?
in survival analysis, usually cancer trials
outcomes can be negative or positive (e.g. relapse vs remission, disability vs disease-free, death vs cure)
what is a hazard?
an instantaneous event rate
the probability that a person will experience an event at a specific point in time (rather than cumulatively)
what a hazard ratio?
the effect of a particular intervention on a particular outcome (negative or positive) per unit time
what type of curve is a Kaplan Meier curve?
time to event curve
how can hazard ratios be used?
to see whether a treatment shortens an illness duration
relative risk of a complication in treatment vs control
types of individuals more likely to experience an event first
what are the five steps of evidence based medicine?
- ask
- acquire
- appraise
- apply
- evaluate
Four things to consider when reading papers?
Does this study address a clearly focused question? (PICO)
Did the study use valid methods to address the question?
Are the valid results of this study important?
Are these valid and important results applicable to my patient or population?
what tools can be used for critical appraisal?
Critical appraisal checklists, critical appraisal skills programme (CASP) checklists are one example but widely used
what is a clinical trial?
a planned experiment on human beings which is designed to evaluate the effectiveness of two or more forms of treatment
how many phases are there in a trial?
we start with can we give therapy to should we give therapy.
what happens in phase 1 of a clinical trial?
testing in small group of people (20-80) to determine a safe dosage range and pharmacological effects of a drug. generally first time testing in humans
what happens in phase II of a drug?
initial study of efficacy, dose or technique based on phase 1, collects adverse events data. some are randomised, other’s aren’t.
what happens in phase III clinical trials?
full scale evaluation of treatment, study efficacy in large groups (100-1000s) comparing new intervention to a standard intervention. designed to detect a clinically meaningful difference. adverse event monitored. this is only done if 1and2 are okay though.
what happens in phase iv of clinical trials?
post marketing surveillance, to assess longer term risks assoc. with intervention
trials can be controlled or randomised what does this mean
controlled - the responses of a group of patients on the new treatment with a control group of similar patients receiving a standard treatment / placebo
randomised - each patient should be randomly assigned to a new treatment group or control group, for unbiased evaluation. there has to be a known, often equal chance of being assigned to each group
what should be included a in protocol for an RCT
a blueprint for the study, who, what, when, where, why, how, how much
what randomisation methods can be used for RCTs? give a small explanation of each method
simple randomisation (tossing a coin)
blocked (restricted) randomisation - used to keep number in each group close at all times
stratified randomisation - classed into subgroups (strata), random allocation in each subgroup
minimisation - adaptive design to ensure best possible balance at all times
when randomisations fails in an RCT, what could have happened?
accidental bias, where a study fails to balance groups, which is more prone in small sample sizes
‘gaming’ the system when allocation is predictable e.g. using DOB for randomisation, small block sizes, non-opaque envelopes
how might you conceal allocation in an RCT?
sealed, opaque envelopes
tables of random numbers
computer generated randomisation lists
central allocation (inc. telephone, web-based and pharmacy controlled randomisation)
sequentially numbered drug containers of identical appearance
blinding definitions. follow up why do we blind?
keeps randomisation status secret
from patients (single blinded)
patients and clinicians (double blinded)
patients, clinicians and researchers (triple blinded)
psychological effects in patients, recording bias by clinicians
in an RCT, what is primary and secondary outcomes
needs a clearly defined, measureable and objective measurement. this is used to determine sample size.
secondary outcomes need to be defined a priori to avoid fishing expeditions
what type of analysis is done in RCTs? why?
intention to treat analysis.
this is designed to manage non-compliance, non-adherence and losses to follow up.
all randomised participants are analyses whether they completed/received treatment they were randomised to. failure to include these will lead to biases results.
what are reasons for non-adherence and loss to follow up in an RCT?
side effects
forgetting to take meds
withdrawing consent
choosing alternative treatment
can lead to reduced statistical power and bias
what are common sources of bias in RCTs
selection bias due to differences in baseline characteristics with respect to diagnosis
performance bias due to differences in case (other than treatment)
attrition bias (due to differences in withdrawal from trial)
detection (ascertainment) bias due to differences in outcome assessments
what is a crossover design?
randomised to a sequence of treatments
each person serves as their own control.
washout period between the treatments. switching from placebo to treatment for example.
what is factorial design?
multiple treatments occurring at once. could be placebo + some type of treatment or vice versa or any combination
what are the ethical and methodological requirements for an RCT?
clinical equipoise (uncertainty in the expert community
informed consent
methodological rigour (statistical and operational)
registration and reporting via registries inc. CONSORT (inc. statement, checklist and flow diagram)
what ethical considerations are there for an RCT?
is the proposed treatment safe?
can the treatment be ethically withheld?
are all potential participants suitable for randomisation?
is it ethical to use a placebo or use blinding?
may need to stop a trial early
what data monitoring standards and agency is there
independent data monitoring committee which can recommend ending a trial
strengths of RCTs
most reliable scientific evidence
high internal validity as exposure to treatment is random, potentially can eradicate bias and isolate the treatment effect
provide true measure of efficacy and allow for meta-analysis
weaknesses of RCTs
expensive and time consuming
limited external validity (generalisability) strict eligibility or insufficient suitable participants
limited scope (difficult for rare events/distant outcomes)
conflicts of interest
ethical considerations
why do we need systematic reviews?
> 2million article a year and >20k boomed journals
need for consistency and precision
informed decision making
inform research agenda
what are systematic reviews?
review of published/unpublished studies
to identify, select, appraise and synthesise all relevant evidence to address a specific question. it is designed to be systematic and reproducible approach to minimise bias. It takes into account quality of evidence and may or may not include a meta-analysis
What acronyms can be used to formulate a question for a systematic review?
PICO (population, intervention, comparison, outcome)
SPICE (setting, population, intervention, comparison, evaluation)
SPIDER (sample, phenomenon of interest, design, evaluation, research type)
what should a search strategy include?
databases, study registers, references, key journals, contact with expert in the field need to be looked at.
- search general databases e.g. medline
- search specialised databases e.g. cochrane
- check reference list of key articles
- hand search key journals
- contact experts in the field
how are studies selected for a systematic review?
eligibility of studies is checked independently by two reviewers and disputes req. a third reviewer. This is based via exclusion/inclusion criteria. A prisma Flow chart should be used to show excluded papers.
what is the quality of individual studies assessed?
critically appraised and scored to assess quality and bias. a standard tool to assess quality is used e.g. Jadad scale, Cochrane risk of bias tool or a standard tool to assess the quality of evidence overall e.g. GRADE
what sort of things might be included in a quality checking scale for RCTs for a systematic review?
whether randomisation was used AND described
was the study double blind AND described how it was double blind
are withdrawals / dropouts acknowledged
how is data analysed for a systematic review?
are results consistent? how are outcomes defined and measured. is there heterogeneity or subgroup analysis?
can a meta analysis be used?
the data analysed should be defined a priori
What statistical measures would you use in a meta-analysis for the following types of data? Continuous, dichotomous, time till event or other
Continuous - change from baseline, standard mean differences
Dichotomous data - OR, RR
Time till event - HR
Others - interrupted time series; incidence or prevalence
If not combinable, needs a descriptive or narrative synthesis
How do you measure heterogeneity in systematic reviews?
Variability between studies can be due to participant characteristics, outcome measures, interventions or methodologies.
Chi2 = variability due to change
I2 = variation in magnitude and direction, describes the amount of variability
What % for I2 statistic for heterogeneity is substantial? What might you do if it it high?
> 60%
avoid pooled analysis, narrative synthesis only and use sub-group analysis
what assumptions are made for a sensitivity analysis? (systematic review)
fixed (low heterogeity) or random effects (high heterogeneity) model (fixed assumes effect of intervention is constant in all studies population)
drug dosages vary between studies
missing data
do the results still stand if you use only the best quality studies (lowest risk of bias)
how is publication bias assessed in systematic reviews?
funnel plots
a plot of each trial’s effect size against some measure of sample size
statistically significant results tend to get published
you will see a gap on the non-statistically significant side of a funnel plot
how would you assess if results apply to your population?
precision of treatment effect - how wide are 95% CIs
what are implication for clinical/public health
implications for research
how would you assess if an association is real? what 4 themes would you look at
whether true association i.e. causal vs non-causal
if result is due to either chance, confounding or bias
definition of bias (again!)
any systematic error in an epidemioglocial study that results in an incorrect estimate of the association between exposure and outcome
what is selection bias?
a difference in how participants get into the study, this could be an error in identification or preferential selection of partipcants based on case/control status or exposure status
name 3 types of selection bias
ascertainment bias when members of a population are more likely to be included that others due to surveillance systems or diagnostics or referral/admission to hospital, can skew outcome e.g. wealthier countries might have higher rates of breast cancer because of better detection due to screening rather than higher rates of breast ca
sampling bias means there is an inappopriate comparison group when the control group isn’t appropriately selected or representatiative of general pop
participation bias is when people in the study are different to those that are part of general population this could be due to volunteerism, worried well, non-response, refusal or survival (health worker effect)
case-control studies are most associated with what type of bias
selection bias
what is information bias?
difference in how data on participants are collected
differences in accuracy of exposure data/outcome data for cases/controls or study subjects wrongly classified
what types of information bias are there?
reporting bias (recall bias)
observer bias (interviewer or instrument bias)
misclassification bias (participants wrongly categorised)
what is the definition of misclassification?
measurement error leads to assigning wrong exposure or outcome category
how can misclassification be sub-divided?
non-differential - when all individuals have some probability of being wrongly classified. Random error, unrelated to exposure or outcome status, not a bias but weakens measure of association.
differential when errors in exposure or outcome status depends on the outcome or exposure. This is a systematic error, related to the exposure or outcome status, this results in bias and affects the measure of association in any direction
how do you minimise information bias?
standardise methods of data collection with objective questions/measures and training/blinding of interviewers.
use multiple sources of information
and use prompts to aid recall
common bias in case-control and retrospective cohort studies
ascertainment bias, participation bias and interviewer bias (exposure and disease have already occurred; differential selection or data gathering in cases and non-cases
recall bias - cases may remember exposures differently from controls
common bias in prospective cohort studies
loss to follow up (main concern) as major source of bias, do we assume they do or do not develop outcome?
ascertainment and interviewer bias (knowing exposure may influence how outcome determined)
non-response or refusals, little concern bias only arises if related to both exposure and outcome
recall bias is not a problem as exposure is determined at time of enrolment
name two types of prevalence
point and period prevalence
name two types of incidence
incidence proportion (cumulative incidence) and incidence rate (incidence density)
what is the point prevalence equation?
number of cases of disease at a specific time divided by population at risk at the specific time
equation for period prevalence
number of cases of disease during a time period divided by population at risk midway through the time period
what would increase prevalence?
increase in new cases
longer duration of the disease
increase in survival (without cure)
in-migration of cases
out-migration of healthy people
improved diagnosis or better reporting
what would decrease prevalence?
decrease in new cases
shorter duration of disease
high case-fatality rate from the disease
in-migration of healthy people
out-migrations of cases
improved cure rate
incidence proportion equation
number of new cases of disease during a period divided by the population at risk at the start of the period
incidence rate equation
number of new cases of disease during a period / total person time at risk
this can be expressed as person-years as long as it is time and person aka force of morbidity or mortality incidence density
risk definition
the probability than an event will occur
odds definition
probability than an event will happen / probability that an event will not happen
mortality rate definition
incidence of death in a population
mortality rate equation
( total no. of deaths from all causes in 1 year / no. of persons in the population at mid year ) x 1000
type of mortality rates
crude mortality rates - total number of deaths per year per 1,000 (or 100,000) people
age-specific mortality rate - total number of deaths per year per 1,000 people of a given age
cause-specific rate - number of deaths due to a particular cause per year per 1,000 (or 100,000) population
case fatality rate equation
no. of people dying during a specified time after disease onset or diagnosis / no. of individuals with the specified disease ) x 100
mortality rates denominator = entire population at risk of dying
CFR denominator = those who already have the disease
what is proportional mortality ratio (PMR) and what does it show
number of deaths from a specific cause during a specified time period / all deaths during that time period. it shows the relative importance of certain cause of deaths in relation to all deaths in that population.
hard to compare different populations due to varying denominators
maternal mortality rate equation
no. of pregnancy related deaths by place and time / average number of women of reproductive age in the same population or time frame x 100,000
what is the rate for infant mortality rate
deaths at age under one year per 1,000 live births
stillbirth rate
foetal births from 28 (24) week per 1000 total live and still births
perinatal mortality rates
stillbirths and deaths in first week of life per 1000 live and stillbirth
neonatal mortality rate
deaths at age under 28 days per 1000 live births
child mortality rate
deaths age under 5 years per 1000 live births
How do ratios help in comparing disease between populations?
Ratios allow comparison by considering the number of cases in relation to the size of different populations, making it easier to make meaningful comparisons regardless of population size.
How is proportion different from ratio?
A proportion is a fraction where the numerator is a part of the denominator, representing a part of a whole, while a ratio compares two separate quantities, and can be used to compare two different population groups.
Compare and contrast point prevalence, period prevalence, and lifetime prevalence with examples.
Point Prevalence: Measures the proportion of the population with a specific condition at a single point in time (e.g., the number of people experiencing an asthma attack on a particular day).
Period Prevalence: Measures the proportion of the population that experienced the condition over a defined period (e.g., the number of people who had an asthma attack in the month of January).
Lifetime Prevalence: Measures the proportion of the population that has ever had the condition at any point in their lives (e.g., the percentage of people who have ever had an asthma attack).
Risk definition and equation
The risk is the probability that a subject within a pop-
ulation will develop a given disease, or other health out-
come, over a specified follow-up period.
Risk =
Number of subjects developing the disease over a time period /
Total number of subjects followed over that time period
Incidence rate, definition and equation
It can be calculated by
dividing the number of subjects developing a disease by
the total time at risk for all people to get the disease. The
denominator of this formula includes a measure of time
instead of just a number of subjects. The incidence rate
should therefore be interpreted as an instantaneous con-
cept, like speed.
Incident rate = number of subjects developing the disease / total time at risk for the disease of all subjects followed
How would you calculate an average waiting time from an incident rate?
Under conditions in which rates do not change with
time (a steady state), the incidence rate can be interpreted
as the reciprocal of the average time until an event occurs,
also called the waiting time. For example, in the calcula-
tion of the incidence rate of vascular access infections in
HD patients, the average waiting time for such an episode
to occur would be 1/0.54 = 1.85 years.
When calculated over a short period of time, the risk
and the incidence rate will be rather similar, because the
influence of loss to follow-up and competing risks which
may flaw risk will only be small
Key features of ecological and cross-sectional studies (revision slides)
Population level information
Snapshot in time
Examine association between amount of risk factor in an area and amount of disease in the area
Uses aggregated data at the population level
Does not use data on individual subjects and therefore subject to ecological fallacy
Often use scatterplots to demonstrate associations
- Think fish consumption and heart disease.
Ecological studies are useful for generating hypotheses
CS studies from revision slides
Surveys - Census, Welsh Health Survey, National Attitudes and Lifestyle Survey
Information not available from routine data to answer specific questions:
What is the caesarean section rate in hospitals in England and Wales?
What is the prevalence of psychological distress following the summer 2012 floods in South Yorkshire?
Usually descriptive but can be analytical
Definition of population attributable risk (PAR) or population excess risk
Population attributable risk (PAR) is the proportion of the incidence of disease in a population (exposed and non exposed) that is due to exposure.
PAR = Incidence rate in total population – incidence rate unexposed.
It is the incidence of a disease in the population that would be eliminated if exposure was eliminated.
Population attributable fraction (PAF) definition
The proportion of the disease in the population that is due to the exposure, assuming causality
= (Ipop - Iu)/Ipop
also = [Pe x (RR-1)] / [Pe x (RR-1) + 1]
Ipop = incidence rate in the population
Iu = incidence rate in unexposed
Pe = proportion of population exposed aka = prevalence of exposure
RR = relative risk
What does PAF (%) answer?
What proportion of the new cases of disease observed in the study is attributable to a risk factor?
What depends on PAF?
PAR & PAF depend on both:
the strength of the association (RR)
the prevalence of exposure in the population (Pe)
to have a large impact on the population the exposure must be common
PAF provides important information about the potential impact of prevention programmes and interventions in public health
Compare and contrast case-control vs cohort
CS vs cohort
Sample Size small vs large
Cost/Time less/short vs more/long
Rare disease good vs bad
Rare exposure bad vs good
Multiple exposures good vs bad
Multiple outcomes bad vs good
Rates of disease novs yes
Recall/selection bias yes vs no
Loss to follow up no vs yes
what is a generalised structure for an RCT?
- Identify population of interest, select sample i.e. using sampling frame
- record baseline measurements on entire sample
- randomly allocate subjects to group A (intervention) and group B (control).
- ITT analysis of results
What 9 questions would you ask for an RCT?
- Appropriateness of the strategy
- Representativeness of sample - can result be generalised
- has random allocation worked?
- has bias been introduced through losses to the trial?
- is the trial clinically relevant? is the outcome a disease, or simply a mechanism in a disease?
- is the power of the trial adequate? is the trial large enough and sensitive enough? are the results statistically significant but clinically unimportant?
- are the results consistent with evidence from other sources. Hence Bradford criteria for testing an association
- is the trial a test of management or of disease i.e. efficiency and effectiveness in Cochrane terms
- cost - other strategies and risk/benefit balance
Domains of cochrane risk of bias tool used to assess the study methods for:
random sequence generation
allocation concealment
blinding
selective reporting
other bias
What 9 things are Bradford Hill’s conditions
- strength of association - the stronger the more causal (OR/RR)
- consistency - if relationship is causal we would expect the finding to be consistent with other data
- specificity - this suggests one exposure = one disease, harder to use now
- temporality - does exposure occur before disease development
- biological gradient (dose response) - as dose of exposure increases risk of disease also increases
- plausibility - sits within current body of evidence
- coherence - replication of finding in different situations, consistent with sub-groups and different population unless plausible cause
- analogy - similar findings between observed association and other associations
- experimental evidence - similar finding between lab run and observational studies
Absolute risk definition
The incidence of an event in exposed people
Risk difference definition
The difference between the incidence in exposed people minus the incidence in
unexposed people (Ie-Iu).
Relative risk definition
Incidence of event in exposed group/incidence of event in the unexposed group (Ie/Iu)
Cumulative incidence definition
Total number (or proportion) of a group of people who experience a new event during a
specified time period
Cross-sectional study definition
Observe a population at a point in time. They are descriptive studies. Useful
in assessing prevalence rates etc. They can show association but not
direction of causality due to lack of longitudinal data.
Meta-analysis definition
The statistical analysis of the data/results from homogeneous studies with
the same outcome of interest to produce an overall, pooled result of the
treatment/interventional effect. Most commonly used for RCTs but can be
used for observational studies too. Useful when studies are too small to
show a significant difference, combining the results of several studies
increases the sample size and power enabling a more precise measure of
effect
If you found an I2 score of 79% what would you do?
This implies severe/strong heterogeneity (1 mark)
* Refit the analysis with a random effects model (1 mark)
* Investigate the heterogeneity using subgroups (1 mark)
* Carry out a sensitivity analysis (1 mark)
* If you can’t explain the heterogeneity - do not pool the data (1 mark)
Define primary outcome
The leading outcomes that determine the success or effectiveness of an intervention
Forest plot definition
The graphical/visual result of a pooled meta-analysis, it presents the pooled effect
size and confidence interval
Funnel plot definition
A comparison of study size and effect size, it is a tool to assess publication bia
Publication bias definition
Publication bias is the term used to describe the tendency for positive and negative
results to be unequally reported in published literature. Negative results are more
likely to not be reported. Reviewers should aim to minimise publication bias. Funnel
plot can be used to assess bias.
8
Internal and external validity definitions
Internal validity means
that the study measured what it set out to; external validity is the ability to generalise from the study to the reader’s
patients.
What might cause variability between studies? aka heterogeneity
- participant characteristics e.g. age group
- outcome measures e.g. pain-free vs disability
- interventions e.g. drug dosage, Rx duration
- methodology e.g. randomisation vs none
Temporal ecological studies characteristics
Temporal ecological studies measure exposure and the outcome in the same population at different points in time
Or between 2 different populations
They can also be referred to as time-trend analysis or longitudnal ecological studies
This data enables a dynamic view of a population’s health
These studies can be used to examine
rates of disease
mortality
changes in health behaviours
Comparisons can be geographical, different time periods or patterns of change over time
Generate hypothese, identify trends
Do NOT prove causality
What is a nested case control study?
An alternative approach is to use a design in which a case-control study is nested within the cohort. In this approach, the cohort is identified and followed up until a sufficient number of cases develop. All these cases, and a random sample of controls among those who have not developed disease in the cohort, become the “case-control” study. Additional information is usually collected and the analysis is carried out.
When can controls be selected in a nested case-control study?
This design is a type of population based case-control study as both cases and controls are drawn from the same population (the cohort).
In this study design, the controls can be selected in three different ways:
- Controls are sampled form the population still at risk at the end of the follw up period i.e. those still disease-free at the end of the study. In this circumstance, a cohort subject can only be a case or a control but not both.
- Controls are sampled from those who are still at risk at the time each case is diagnosed, so that controls are time-matched with cases. In this circumstance, a cohort subject originally selected as a control can later become a case.
- Controls are sampled from those who were at risk at the start of the cohort. This design is called a “case-cohort” study. In this circumstance, a cohort subject can be a control and then become a case and vice versa.
What measures are used for nested case control studies?
It can be mathematically shown that the odds (of exposure) ratio from a nested (or a population based) case-control study can provide an unbiased estimate of the risk ratio and rate ratio, and the odds (of diasese) ratio depending on how controls were selected.
Controls sampled from those who are disease free at the end of follow up = estimate of odds of disease ratio
Controls selected from those at risk at the time each case develops = estimate of rate ratio (time-matched analysis required).
Controls selected from all those initially at risk = estimate risk ratio
selection bias definition
how subjects get into the study or errors in the process of identifying the study population
ascertainment dias is
differences in surveillance (detection), hospitalisation (referral, admission or diagnostic).
If exposed cases have a difference chance of admission to controls this can lead to an overestimate in size of effect.
sampling bias is…
inappropriate comparison group. How representative are the controls of the population giving rise to the cases. This can lead to over/under estimation.
participation bias is…
self-selection or volunteerism, non-response or refusal.
It is how the responders differ from non-responders which lead to an over/under estimation of effect.
how might you limit selection bias?
you need to improve your choice of study population.
this can be done via clear definition of study population; using a consistent case definition; trying to enrol all cases; ensuring cases/controls are from same population
what is information bias?
biases in how data is collected
if study subjects are wrongly classified this becomes misclassification bias
misclassification bias is…
when study subjects are wrongly categories due to inaccurate diagnosis or admin error etc.
reporting bias is…
recall bias, where cases may more clearly recall exposures than controls
observer bias is…
interviewer bias - interviewers case/control status and probes cases differently as a result
data collection instruments - calibration errors or measurement in interviews
what is misclassification? what the types and how are they different (4 things)
measurement error leads to assigning wrong exposure or outcome category
this can be non-differential i.e: random error; unrelated to exposure or outcome status; not a bias but weakens measure of association.
differential is a systematic error, related to exposure or outcome status; it introduces bias and can distort measure of association in any direction
type I error and type ii error defined in terms of rejecting null or not.
how do we reduce type I and II error
I - reject null when it is actually true
II - accept null when it is incorrect
increase power of study and select appropriate sample size
what is a systematic review?
A review of published and unpublished studies focussed on a single question
Aims to identify, appraise, select and synthesise all evidence relevant to the question
Take into account quality of the evidence
steps in a systematic review
- Ask a focussed question (PICO)
- Define inclusion and exclusion criteria
- Locate studies – Database search, grey literature
- Select studies
- Data extraction – assess study quality – NOS: observational studies/ Jadad – RCTS.
- Analyse results:
Individual study results
Meta-analysis if appropriate/ forest plot
Heterogeneity – fixed and random effects model
Sensitivity analysis - Interpret results
Tell me about heterogeneity, the tests, the models and how to interpret them
Chi2 test: null hyp = no heterogeneity. Therefore a small p-value = heterogeneity. If there are only a few studies the test is under powered so a cut of 0.10 is used to assess heterogeneity.
I2 test: Depends on magnitude and direction of effects. Should be used in combo with Chi2 statistic i.e. evidence for hetergeneity. Rough guide – 0-40% litte, 40 – 75% Moderate, 75%+ Considerable.
Fixed effects model: Use if heterogeneity is not deemed to be a problem. Underlying assumption that there is a ‘fixed’ or common feature that underlies all of the studies in the analysis. It treats the studies as if there was no heterogeneity.
Random effects model: Use if there is considerable heterogeneity between studies. Underlying assumption is that the studies are estimating different treatment effects. Looks at the distribution of effects across different studies.
If you try both models and they give similar results it is unlikely that heterogeneity is NOT a problem and you can use either.