Epidemiology Flashcards
Descriptive study definition including study types
Measure the occurrence of outcomes.
Can be split into either populations or individuals.
Individuals - case reports, case studies, case series, surveillance and prevalence cross-sectional studies
Populations - Ecological studies
Analytical study definition
Test the association between exposure and outcome
How can you measure the distribution of disease?
Time - year, season, day, hour
Place - country, region, district
Person - age, sex, social class, lifestyle
Four commonly used sources of data
Routine statistics
Population censuses
Surveys
Special studies
5 routine statistic sources
Death certificates
Birth records
Special disease registers - cancer registries
Communicable disease reports
GP records
What is an ecological study? (at least 4)
An observational design study (no treatment)
Use routinely collected data
Based on groups - not individuals (group is unit of observation), not possible to link exposure to his/her outcome
Uses correlation coefficient (r)
Useful for generating hypotheses, not useful for true exposure risk at individual level.
Can be useful at looking at group level disease e.g. schools
An example could be air pollution and mean bmi of an area
Why use an ecological study?
to investigate aetiology and risk factors for disease or evaluate changes in health care policy
generate hypotheses
estimate prevalence
What is the definition of ecological fallacy?
Associations at population level do not imply association at an individual level
What are the strengths of ecological studies? 8 marks
Quick and cheap
Use available data e.g. routine stats
Some factors operate at population level e.g. air pollution
Some exposure data only available at population level
Differences in exposure between areas may be larger than those between individuals in one area
Ability to map ecological data
Can generate hypotheses
Random errors may be smaller for populations than individual exposures
What are the weaknesses of ecological studies?
Data may be collected or recorded differently in different places
Surrogate measures based on the average of the population
Spatial boundaries are artificial
Confounding (lack of data)
Could use proxy measures
Classification challenges
Ecological fallacy
Uncertainty in temporal relationship
Collinearity in variables (i.e. your variables are too similar)
What is a cross-sectional study? (8)
An observational study
Carried out at a single point in time
Snapshot of population health
Collects individual data
Can measure prevalence, not incidence
Cannot prove cause and effect
Good at generating hypotheses
Can be descriptive or analytical (descriptive will describe the data, analytical will investigate risks factors and outcomes, collecting data on outcomes and exposures at the same time)
Typically use surveys to gain data
Examples of national CS studies
Surveys - census, national survey for Wales, national attitudes and lifestyle survey
CS design
Population has a representative sample (if not using whole pop).
Descriptive:
Then you need a number with disease or exposure and then a number without disease or exposure.
This will allow calculation of prevalence of disease
Analytical:
Will need number with/without disease and then with/without outcome, so 4 sample groups vs 2 groups.
CS Study and temporality
CS study - difficult to measure temporality, chicken and egg scenario. This temporality issue depends on the exposure e.g. a genetic factor does not vary over time whereas exercise levels will change
Strengths of CS studies
Quick
Cheap and simple
Good for chronic diseases
Data on individuals e.g. questionnaires
Can estimate prevalence
Can assess many outcomes and risk factors
Can generate hypotheses
Weaknesses of CS studies
No good for acute diseases
No good for rare diseases
Prone to bias
Need high participation rates to be valid
Only a snapshot
Cannot infer temporal or causal relationships
Bias definition
Bias is a consequence of defects in the study design or execution of a study and cannot be controlled for by statistical measures and often cannot be mitigated by increasing sample size.
It is any systematic error in an epidemiological study that results in an incorrect estimate of the association between exposure and outcome
Name two types of bias
Selection bias (differences between groups) in how study subjects are chosen or respond
Information bias (difference between groups) in the accuracy of data on exposure/outcome
CS studies key metholodigical concerns
Validity and repeatability
Response rates (non-response)
Sampling - how representative of the true population is the sample? (sample size calls done?)
Association is NOT the same as causation
Definition of confounding
A spurious relationship between exposure and outcome due to the presence of another variable which is associated with both the exposure and the outcome
How to control confounding?
By study design methods:
- randomisation (only in intervention designs)
- restriction (limit entrance to those within specific categories (e.g. age group)
- matching e.g. age or sex
By analysis methods
- stratified analysis
- multivariate analysis (can control simultaneously for several factors)
The goal of statistical analysis in the context of sampling is…
If we wish to say something about an attribute of the population, we take a sample so we can make inferences back to the population
Sampling in observational vs intervention
In observational you sample participants to observe
In intervention you allocate participants to observe
Why do we sample?
Not possible to collect info from all subjects
Sample can provide reliable info on the population by:
- estimating important population parameters (means, proportions etc.)
- to infer toward the populations (using valid distributional assumptions)
- present inferences using estimates, confidence intervals, hypothesis tests (p values)
We do this to potentially minimise bias
What is non-random sampling?
Non-probabilistic
Judgement (purposive) - selection based on personal (expert) belief about representativeness
Accessibility (convenience) - select the most easily obtained participants
Quota - judgement and accessibility used to achieve specifies group sizes
It does not involve random selection
May or may not represent the population well
Used when researcher lacks a sampling frame for the population
These methods may introduce bias
What is judgement or purposive sampling?
Sample selected based on (subjective) judgement of researcher
This is often used
It is a type of non-probability sampling
You are not attempting to be generalisable
Potential for bias is high
Examples of non-random sampling
Quota sampling - people on the street, widely used in market research, strata corresponding to different characteristics e.g. age, sx, race
No sampling frame
Problem -> stratum may not be random
Pro and con of non-random sampling
Con - selection units up to investigator - high bias
Pro - convenient, less expensive than probability sampling
Random sampling is…
Selection of participants from the population is random. It is any method that utilises some form of random selection.
Different units in the population should have a known probability of being chosen.
AKA probability sampling
Removes the possibility of bias
Sampling unit definition
The elements being sampled (e.g. the postcode rather than the individual)
Sampling frame definition
the list of all units able to be sampled (the list of all postcodes included)
Sample definition
Collection of sampling units drawn from the sampling frame
Sources of error in sampling
Estimate = true value + random error + systematic error (bias)
Sample size (n)
Sampling design
Non-response (unit or item)
Measurement:
- observer or interviewer
- participant or respondent
- instrument e.g. questionnaire
- mode of administration
Does bias effect accuracy or precision?
Accuracy - accuracy is crudely defined as the difference between the population value and sample estimate
Precision definition
How closely repeated measurements or observations come to duplicating measured or observed values
Ways of sampling (replacement)
With replacement - the chance of selecting the each unit does not change from selection to selection. The same unit could be selected more than once.
Without replacement - the chance of selecting each unit does change from selection. under this strategy the same unit may not be selected more than once.
Simple random sampling
gold standard
everyone has equal property
n/N could be 100/100,000 therefore 1/1000 probability to be picked.
This is sampling without replacement
True SRS can be expensive but has low bias, might be inefficient to do as well
How to do simple random sampling
- obtain a sampling frame that includes all units
- number each unit 1-N
- generate n random numbers between 1 and N
- if any random number selected is greater than N or already selected ignore it
- select each unit corresponding to the random numbers
Systematic sampling process
How does it compare to simple? (1)
may be more convenient than simple
process
1. put numbers in a sequence
2. divide the study population size by required sample size to determine interval size
3. choose a random number between 1 and k
4. start at random number and add part 2 each time
e.g. n/10,000 - 200/10,000
pick random number for k, 200/k
if k = 27, 27+50 = 77, 77+50 = 127
Stratified random sampling
We may want independent results for different sub-population e.g. sex, age or location
Simple random sampling are taken for each strata
Advantages of stratified sampling
increases the chance that sample mean is a precise estimate (e.g. less random error) of the population mean because the sample resembles the study population better
useful if interested in sub-pops
strat samples useful when interested in comparing groups
can use different sample designs if different strata
administrative convenience
cons of stratified sampling
size of the strata in the pop must be known
information on strata characterstics must be known in advance
a weighted analysis is required if not proportionate to the population distribution of strata
cluster sampling
populations may consist of large number of groups e.g. households, schools or hospitals, these can be referred to as clusters
we first take a simple random sample of n clusters and include all sample members of the chosen clusters.
the sampling unit is the cluster not the individual essentially SRS applied to groups
cons of cluster sampling
hard to define cluster
less efficient than SRS
members of the same cluster tend to be more similar
Samples might not be independent:
- tends to be less precise
- requires a large sample than srs
pros of cluster sampling
often cheap and convenient to draw sample
only for admin convenience
often we do not have info of the individuals only a group e.g. address of household
two-stage sampling process
- draw up list of all first stage units (called primary sampling units)
- select random sample of PSUs
- for each PSU selected draw up a sampling frame of second-stage units
- select a random sample of the second stage units (SSUs)
Pros and cons of two-stage sampling (4)
can reduce cost if collecting data in person
usually less precise, have larger SEs
can be extended to 3 or more stages
req. specially weighted analysis
postal questionnaire as a mode of data collection pros and cons
relatively cheap and easy, able to cover wide geography, larger
poor response rate, can be completed by not the respondent, not possible in some settings, may appear probabilistic - but actually not
by enumerator mode of data collection - pro and con
high response rate, often greater details given, able to take other non-recorded data e.g. samples
cons - v costly, enumerators can influence and bias responses, hard to over come geographical differences
by the internet, mode of data collection - pro and con
very cheap and easy to collect
usually biased, non-probabilistic, difficult to analyse
summary of sampling design
important to ensure good estimates of population parameters as they reduce bias (systematic error) and measure precision (standard errors). random sample reduces chance of bias. not always possible to do SRS so stratified, cluster and two stage is req. more complex designs tend to increase sample size
are standardised death rates adjusted or unadjusted?
Adjusted by age
What methods of standardising are there?
Direct and indirect
What is indirect standardisation?
Age-specific death rates from a standard population are applied to the index population - provides a standardised mortality ratio and indirectly standardised rates. It does not require age specific mortality rates of the index population.
what is direct standardisation?
age specific rates are taken from the population that we are standardising and applied to a standard population.
Provides directly standardised rates. Age specific-rates are a necessity.
SMR equation
Total number of observed deaths / total number of expected deaths x 100 = SMR
Steps in indirect standardisation
- apply age-specific rates from the standard population and calculate the expected number of deaths
- sum up number of expected deaths
- divide the observed deaths in the index population by the number of expected deaths
How to interpret SMR
can only compare the index population with the standard population.
An SMR of 100 = similar expected rates
>100 = larger mortality rate
<100 = lower mortality rate
Should include a 95% CI
Direct standardisation method
- Calculate local population death rates for each category
- apply the age specific mortality rates from the index population to the standard population to calculate expected number of deaths
- sum up expected number of deaths for the index population using the standard population age groups
- calculate overall age standardised mortality rates for the index
Which standardised mortality ratio method is best?
- the direct method requires availability of age-specific rates for the study population of interest
- the indirect method requires the total number of cases in each population
- if numbers are small the indirect method is advisable
demography definition
the study of characteristics of human populations
Infant mortality definitions
- Stillbirth – born after 24 or more weeks completed
gestation and which did not, at any time, breathe or
show signs of life - Perinatal – stillbirths plus early neonatal deaths
- Early neonatal: deaths under seven days
- Neonatal – deaths at under 28 days
- Postneonatal – deaths between 28 days & one year
- Infant – deaths under one year
- Rates – neonatal, postneonatal and infant mortality
rates are reported per 1,000 live births
What are case control studies useful for?
Outbreak investigations
Studying uncommon diseases
Studying diseases with long latency
Generating and testing hypothesis
Quick and cheap to conduct
Do case control cases start with people with the disease or without the disease?
With the disease
How do you identify a case for a case control study? (6)
Need a precise case definition - must be objective with a validated test e.g. blood test, histology, X-rays or sonograms
It can be classified by diagnostic certainty e.g. confirmed, probable or possible
Whilst precise definitions can limit generalisability, it increases validity
Often req. inclusion/exclusion criteria (who, what, when, where)
Must report on numbers that meet exclusion/inclusion
Must decide whether incident or prevalent cases (i.e. all new cases within fixed time period, or taking all new and old ones)
Why would you use incident cases for a case-control study? (3)
Useful if exposure is associated with recovery or survival
Greater representativeness at its timely
Disease behaviour change less likely
What are the pros and cons of using existing cases for a case control study? 1 and 3
Pro - can be used when its difficult to establish date of onset e.g. study of H.pylori infection
Con - may not be representative of all cases, pts with long course disease tend to be over represented and recall bias
Case control studies - population vs hospital cases
Population based studies try and recruit all from a defined population over a fixed period of time. It requires tracing subjects and has issues with completeness and refusal to participate.
Hospital based studies require a clear case definition and protocol adherent to minimise bias as they are more prone to bias (note this won’t score a star in some systems)
How would you recruit controls to a case control study? What characteristics must they have? (5)
Controls should be free of disease and representative of the population.
They must have the potential to become diseased.
They must be from the:
- same source population
- same inclusion/exclusion criteria
- identified as cases if they had the disease while under investigation
What options are there for selecting controls? i.e. what types of groups of controls can be used
Population controls (random from registry/list/directory)
Neighbourhood controls
Friends/family controls
Hospital controls
Pros and cons of using population controls
Can gain a random sample
Limitations: can be difficult and time consuming, healthy people may not participate, may be low response rate, selection bias (e.g. if using a register with phone numbers some may not have phone)
Neighbourhood controls pros and cons
Pro
No need for population register
Controls for social factors e.g. deprivation
Con
Poor cooperation
Can be time inefficient
Friend and family control group pro and con
Pro - no need for pop registers, quick and efficient, easily to control for social factors so may reduce confounding
Con - overmatching, selection bias (i.e. family members may be different from gen pop)
Hospital controls pro and con
Pro - easy to identify, relatively cooperative
Cons - overmatching (more likely to be sicker and have higher risk factor exposures), selection bias (catchment population differs for different diseases)
What may affect your case:control ratio?
If ample cases can be 1:1 for sufficient power
If rare outcomes can go 1:1 - 1:4 , >1:4 is a waste of resources as minimal increase in statistical power
How can we control for confounding in case control studies?
Study design methods
- randomisation (can’t do this one)
- restriction (can be done)
- matching (can be done, risk of over-matching)
Analysis methods
- stratification
- multivariable analysis
What is overmatching?
Matching is too close or elaborate. Controls become difficult to find and may fail to find a true causal association or may underestimate the association.
Odds ratio vs relative risk
Odds ratio is simply an association, with no indication of temporality. Incidence of disease is unknown. Relative risk is usually interpreted as the risk of having outcome of interest if you have the exposure i.e. if temporality is indicated. Odds tend to approximate OR. If the prevalence of the disease is <10%, the relative risk and OR can approximate each other.
What types of biases are case control studies known for?
measurement bias (observer, responder and limitations of instruments) and selection bias (selection of cases and controls)
Case control strengths
Good for study for rare diseases
Can use small sample size
Makes use of available data
Rapid
Low cost
Suitable for diseases with long latency
Can examine multiple exposures for a single disease
Case control studies limitations
Cannot directly measure relative risk
Not suitable for rare exposure
Temporal relationship exposure-disease difficult to establish
Prone to several biases - selection of controls, recall when collecting data
Loss of precision due to sampling
what is a cohort study?
an observational study which follows up two or more groups of people from exposure to outcome. A simple cohort study has an exposed or unexposed group. Participants that develop the outcome are recorded and rates in the two groups compared.
members of a cohort study must be…
free of disease at the start of the study and at risk of the outcome being studied.
for common exposures, cohort may be drawn from gen pop. for rare exposures, a specific group may be more suitable
Name two types of cohort studies and what they are
prospective cohort studies - start before outcome has occurred. cohort is disease free at the start of follow up. data is captured prospectively
retrospective (historical) cohort studies - start after outcome has occurred. relies on medical records or routinely collected data. data on measurements, exposure and outcome are collected retrospectively after the events have happened
risk definition
the probability that an event will occur
when is risk usually used as a method of association?
in RCTs and cohort studies
relative risk definition
the risk of a particular outcome (Disease) when a particular exposure (risk factor) is present
what is easier for people to understand, relative risk? or absolute risk?
absolute risk
what is absolute risk denoted as?
I(subscore)e