POPH progress test - study designs Flashcards
Prevalence
The proportion of a population who have the disease at a point in time
Able to look at burden of diseases and leads to better resource allocation
Equation for finding prevalence = number of people with the disease at a given point in time/ total number of people in the population at that point in time
When reporting prevalence must state …. Measure of occurrence e.g. The prevalence Exposure or outcome e.g. of asthma Population e.g. in the POPH192 class Time point e.g. on August 10th 2020 Value e.g. was 10%
Limitations of prevalence
Difficult to assess the development of disease
Is influenced by the duration of the disease - diseases with a longer duration have a larger prevalence by default
Incidence proportion
The occurrence of new cases of an outcome in a population during a specific period of follow-up // The proportion of an outcome-free population that develops the outcome of interest in a specified time period
Incidence proportion (IP) and incidence rate (IR) - key difference is what you use as the denominator
Equation for finding incidence proportion = number of people who develop the disease in a specified period/ number of people at risk of developing the disease at the start of the period
Why might people not be considered ‘at risk’ at the start of a study?
They already have the condition
The condition is something that they cannot develop such as women developing prostate cancer
Reporting incidence proportion … Measure of occurrence e.g. The incidence proportion Outcome e.g. of low back pain Population e.g. in nurses Time period e.g. in 12 months Value e.g. was 35%
Take the incidence proportion because it gives risk (average)
Limitations of incidence proportion
Assumes a ‘closed’ population - Does not account for people coming and going and it assumes that everyone is exposed to the risk over the same time period
Highly dependent on the time period - Longer time period = higher incidence proportion
Incidence rate
The rate at which the new cases of the outcome of interest occur in a population
Equation for finding incidence rate = number of people who develop the disease in a specified period / number of person-years at risk of developing the disease
Person-years at risk = sum of everyone in the population’s time at risk of becoming a case
48 person-months = 4 person-years (divide by 12)
cases/ person-years = _____ cases per person-year (times by 100 to get rid of decimals)
Why might someone stop being ‘at risk’?
They become a case
They are lost to follow-up e.g. die, move away, no longer take part
Follow-up time ends
Incidence rate reporting Measure of occurrence e.g. The incident rate Outcome e.g. of glandular fever Population e.g. in the class Value e.g. was 50 per 100 person-years
Limitations of incident rate
Person-time not available
Complex to calculate
Epidemiology
the study of the distribution (descriptive epidemiology) and determinants (analytical epidemiology) of health related states or events in specified populations
Cross sectional study
Measures exposures and/or outcomes at one point in time (snapshot in time)
Point in time describes a particular date (12th August 2020), a specific event (retirement), a specific period of time (last 12 months)
Descriptive and observational study
Often survey/census
Cross-sectional studies measure prevalence - the proportion of a defined population who have a disease at a given point in time
Prevalence = number of people with disease at a given point in time/total number of people in the population at that point in time
Remember that prevalence is affected by incidence (onset of disease) AND duration (how long the disease lasts)
We can use cross-sectional studies to describe and compare prevalence/s, and generate hypotheses about potential risk factors and to plan (e.g. Health service delivery)
Limitations of cross-sectional studies
Temporal sequencing - since exposure and outcome are measured at the same time you are unable to distinguish which came first between the exposure and the outcome
Measures prevalence not incidence, it cannot tell us about the onset of disease
Not good for studying rare outcomes or exposures
Not good for assessing variable and transient (short-lasting) exposures or outcomes
Strengths of cross sectional studies
Can assess multiple exposures and outcomes
Can be used to calculate prevalence, distribution of prevalence in the population and hypothesis generation
Can be less expensive than some other study designs and is relatively quick (because they only collect data at one point so pretty quick and inexpensive)
Ecological studies
Compare exposures and outcomes across GROUPS (e.g. states, countries, regions) not individuals - Cannot link back to the individual
Ecological studies are descriptive and observational
We use ecological studies to compare between populations, assess population level factors in disease (such as air pollution that cannot be examined at an individual level) and hypothesis generation
Limitations of ecological studies
Ecological fallacy = making assumptions about individuals based on data from the group they belong to
Cannot control for confounding
Cannot show causation
Strengths of ecological studies
Assesses population level exposures such as air pollution and legislation
Good to use for considering hypotheses - to see whether ecological data is consistent with your idea
Can be used for hypothesis generation
Inexpensive and easy and data is often routinely collected
PECOT
P = Population → The group of people in the study
E = Exposure → what the potential determinant is
C= Comparison → what the potential determinant is being compared to
O = Outcome → the health outcome being assessed
T = Time → How long people are being followed-up
- Source = population the sample is recruited from
- Sample = population included in the study
Three measures of association
Relative risk = incidence in exposed/incidence in unexposed
Risk difference = incidence in exposed - incidence in unexposed
Odds ratio = odds of exposure in cases/ odds of exposure in controls
Relative risk
Ratios of incidences
Tells us the strength of association - how linked the exposure is to the outcome
Relative risk = incidence of exposed group/incidence of unexposed group(comparison group)
How many times as likely is the exposed group to develop the outcome than the comparison?
When incidence of outcome is the same as the incidence of the exposure and the incidence of the comparison group then you get equal likelihood of outcome in both groups.
Exposure does not change the likelihood of the outcome, so no association between exposure and outcome - this is the null value = 1
Number above 1 means that there is …
Greater incidence of the outcome in the exposed group
Greater likelihood of outcome in exposed group
If the outcome is bad, exposure is potentially a risk factor for the outcome
Number less than 1 but above 0 (because rations are never 0 or below it) means that there is …
Greater incidence of outcome in the comparison group
Greater likelihood of outcome in the comparison group
If outcome is bad, exposure is potentially a protective factor for the outcome
Interpreting relative risk
The EXPOSED GROUP were VALUE as likely to develop OUTCOME compared to COMPARISON GROUP
Same interpretation with as likely phrase for both incidence proportion and for incidence rate
Risk difference
Differences in the incidences
Tells us the impact of the exposure - how much of the outcome is due to the exposure, and how much disease we would prevent by removing the exposure
Risk difference = incidence in exposed - incidence in unexposed
When RD is greater than 0 …
Meaning there is a greater likelihood of the outcome in the exposed group.
Risk factor if the outcome is bad
When RD = 0 ….
The RD null value
No association between exposure and outcome
When RD is less than 0 …
Meaning there is a greater likelihood of the outcome in the unexposed group
Protective factor is the outcome is bad
Interpreting risk difference
There were VALUE extra/fewer cases of OUTCOME in EXPOSED GROUP compared to COMPARISON GROUP
But report differently for incidence proportion and incidence rate - the way you write the value changes…
For incidence proportions e.g. 15 extra cases per 100 people over 10 years
For incidence rate e.g. 15 extra cases per 100 person-years
RR vs RD
RR
Clues to causes
Strength of association
Null value = 1
RD
Impact of exposure
Impact of removing exposure
Null value = 0
Cohort studies
Analytical and observational
Analytical epidemiology = exposures and outcomes
Observational = observe people’s exposures and what happens to them
Individuals are defined on the basis of presence or absence of exposure to a suspected risk factor
Process of a cohort study …
1- Identify a source population
2- Recruit the sample population who don’t have the outcome of interest
3-Assess their exposure level and categorise participants into appropriate groups i.e. exposed and not exposed
4-Follow up over time and see who develops the outcome
5-Calculate the measures of occurrence and association
So in a cohort study, we know the exposure comes before the outcome
What can we measure using cohort studies?
Incidence proportion (IP) and incidence rate (IR) - These are measures of occurence Incidence proportion = Number of people who develop the disease in a specific period/ number of people at risk of developing the disease at the start of the period
Incidence rate = Number of people who develop the disease in a specific period/ number of person-years at risk of developing the disease
Relative risk (RR) and Relative Difference (RD) Relative Risk (ratio) Rate ratio = IR exposed/ IR control Risk ratio = IP exposed/ IR control People in the exposed group were \_\_\_\_ times as likely to have \_\_\_\_\_\_\_\_ compared to those \_\_\_\_\_\_\_\_
Risk Difference (subtraction) Rate difference = IR exposed - IR control Proportion difference = IP exposed - IP control
What other considerations do we need to make in a cohort study?
The health worker effect → when studies are derived only from a population of adult works and this cannot be generalised to the population at large. This is because those who are working are overall healthier than those who are not. This could only apply to our study because hospital workers are likely healthier than the general population
We need to be sure that the sample population does not have the outcome
For some conditions there might be a preclinical stage where the disease outcome has started but it is at a stage where it cannot be detected and in this case the exposure might not actually proceed the outcome as you might think
Ensure that participants have been correctly classified into exposure groups (and haven’t changed exposure groups during the study period)
Loss to follow up - did any participants leave the study
Has the outcome been classified correctly?
Strengths of cohort studies
Temporal sequence (the exposure comes before the outcome)
Can examine multiple outcomes from an exposure
Can calculate incidence (and therefore relative risk and risk difference)
Good for studying rare exposures (because researchers are defining people based on their exposure so this can ensure that there is enough people with and without the exposure in the study)
Limitations of cohort studies
Loss to follow up can lead to bias if related to the exposure and outcome
Potential for exposure/outcome missclassification
Time consuming
Expensive due to their nature
Generally not good for studying rare outcomes
Two types of cohort studies
Prospective cohort studies
Prospective cohort study starts with the exposure…researcher classifies the exposure and follows up over time and observes the outcome
Everyone is outcome free and you are following them up
Historical cohort studies (sometimes called retrospective cohort studies)
Historical cohort study starts after the outcome ….exposure and outcome have already occurred so the researcher is starting here
Use existing data
Reconstruct follow-up period in the past
Strengths
Less expensive
Less time consuming in comparison with prospective cohort studies
Good for outcomes that take a long time to develop or are rare
Limitations
Use existing data (collected for other reasons) - quality?
Researchers don’t have control over the quality of the data and the data may not include exactly what the researchers want to know
May not know about all the relevant factors
Selection bias - One of the strengths of a prospective cohort study design is that because the outcome hasn’t happened yet, the classification of people to whether they are exposed or not isn’t going to be affected by the outcome as it hasn’t happened yet but a potential limitation of a historical cohort study is that because the outcome has already actually occurred it is possible the outcome could in some way influence the exposure categorisation and introduce selection bias
Case-control studies
Addresses some of the issues with doing a cohort study
Analytical, observational
They work backwards - we start with the participants with a known outcome status
Cohort you start with exposure status and then find outcomes
Case-control you start with outcome status and then find exposures
Designed for rare/slow to develop outcomes
Can efficiently examine acute or transient exposures
Process
1- Identify a source population
2- Identify the cases (people with the outcome) and controls (people without the outcome)
3- Assess their prior exposure level in cases and controls
4- Calculate your odds ratio (measure of association)
Limitations of cohort studies
Can be inefficient with rare/slow to develop outcome
Can be inefficient with transient/acute exposures
Alternatives to combat these limitations
Historical cohort studies
Case-control studies
Logic of case-control studies
If cases and controls are a good representation of the source population
Ratio of odds of exposure quantifies association between exposure and outcome
Odds of exposure (cases)/Odd of exposure (controls) = odds ratio
Odds of exposure in cases is greater than exposure is more likely in cases so the exposure is a potential risk factor
Odds of exposure in controls is greater than exposure is more likely in controls so the exposure is a potential protective factor
Odds in what study type and explain why
Case-control studies
Can’t calculate prevalence or incidence of outcome - have selected number of people in study with and without outcome
Odds of exposure in cases = a/c
Odds of exposure in controls = b/d
Odds are not measure of occurrence
Ration of odds = how many times as likely cases are to have the exposure compared to the controls
Interpreting odds ratio
Different interpretation to the RR
RR = people with exposure are X times as likely to develop the outcome as people with comparison
OR = people with outcome are X times as likely to have had the exposure than people without the outcome
Fortunately when disease is rare, OR approximates the RR (rare disease assumption) and so you can interpret OR just like RR
Use RR interpretation in this course
Obvious problem with case-controls
Controls do not have the outcome - so when do you measure exposure for controls
Index date = pretend control had event on same date as case (or close to it)
Can be used for transient exposures in controls who didn’t have an associated event - the exposure is measured on the same date as the case
E.g. texting while driving (exposure) and having a car accident - see whether controls used their phone on the date the case had the car accident (index date)
Case selection for case control
Defined by outcome so only one
Clear outcome definition and identification
The purpose of having a control group is to estimate the prevalence of exposure in the population from which the cases come from
Comprehensive case finding
Prevalence cases
Usually try to identify incident cases - sometimes just recruit people with the outcome (prevalent cases)
As people come along and develop the outcome they are recruited into the study
Controls must also be capable of becoming a case, and often we select multiple controls per one case - for better statistical power
Using an inappropriate group of controls we fail to find the association that actually exists and this is why it is important to have correct controls
Exposure measurement for case control
Need to measure exposure period before outcome
Differential recall
Cases trying to work out what made them sick
Outcome may affect recall ability
Exposure measurement must be comparable
Dead cases vs alive controls
Interviewers may act differently for cases and controls
Using hospital based controls can cause problems as they are not necessarily representative of the populations which the cases came from (they may have diseases which are also related to the exposure being studied)
When measuring exposure level in a case-control study, there is the potential for information bias (recall and interviewer bias)
Strengths for case-control studies
Good for rare outcomes (specifically choose the cases with the rare outcome) and transient exposures (can ask study participants about this exposure and their history)
Can assess multiple exposures
Temporal sequencing - we know that exposures will have come before the outcome in a good case control study
Quick and inexpensive in comparison with other studies such as cohort studies
Limitations for case-control studies
Can only study one outcome
Difficult to select an appropriate control group
Prone to selection and recall bias
Randomised control trials
Randomised - Participants randomly allocated to groups
Controlled - Always have a comparison (control) group
Trial - Testing effect of treatments/intervention
Trial involves assigning participants to exposed or comparison group
Analytic, interventional
They are similar to a cohort study however instead of measuring an exposure, we randomise an intervention such as a medication
Process
1- Identify a source population
2- Randomly select the sample population who don’t have the outcome of interest
3- Randomise the sample to either the intervention or control group
4- Follow up over time and see who develops the outcome
5- Calculate the measures of association and occurrence
Randomisation
Random allocation
This is when we randomly allocate the participants into either the control group or the intervention group
It is not the same as random selection
Random selection is about how you recruit people into the study in the firs place whereas randomisation/random allocation is about how the people who are recruited into the study are randomly allocated to receive the treatment or be a control
The purpose of randomisation is to control for confounding (when another variable like age or sex distorts the relationship between exposure and outcome)
Randomisation does this as if done correctly, there should be the same proportion of known and unknown confounders in each group meaning that the groups are comparable
Successful randomisation means that confounding is an unlikely reason for difference in outcomes between groups
Randomisation is more likely to be successful if applied to large numbers
Protection of randomisation and its benefits
Large numbers
Concealment of allocation
It is a different concept to blinding
It means that the person randomising the participants does not know what the next treatment allocation will be - it is concealed and unpredictable
Concealment of allocation could be achieved by a computer generated randomisation code
As well as preserving the benefits of randomisation, concealment of allocation prevents selection bias - from participants or their doctors from selecting the treatment they want
Intention-to-treat analysis
The is when we analyse as we randomise - it means that we are not swapping people between groups (and therefore maintaining the randomisation)
Intention to Treat analysis also gives us a real world effect - people do not take interventions perfectly in real life
The other type of analysis for an RCT is a ‘Per Protocol’ analysis - where we analyse participants who fully complied to the study protocol
Per protocol analysis can show the efficacy of the treatment, but we lost the benefits of randomisation
Potential sources of bias for randomised control trial
Lack of Blinding
If people involved in the study (i.e. the patient and the doctor) know whether they were in the intervention or control group, this may influence them …
E.g. if someone in the control group knew they were taking sugar pills they may be less likely to report a benefit or side effects
Studies can be single blinded (participants) or double blinded (participants and researchers) however it is better just to be specific who is blinded in the study
We can use certain methods like matching placebo pills to blind but sometimes it is tricky e.g. if things like physiotherapy or surgery are an intervention
Blinding is an important way to protect against bias
Challenging to achieve in practice …
Safety concerns for participants e.g. if someone has to go to hospital and don’t know whether they are receiving treatment or not
Can be obvious which group a participant is in
Loss to follow up
If people leave the study or are loss to follow up this can cause confounding and bias - the people who left may be different to those that stayed
Similar problem as in cohort studies
Non-adherence
Participants do not do what they are supposed to do
Can include doing what the other group is doing
Strengths of RCTs
Best way to evaluate an intervention
Can calculate incidence so can directly calculate relative risk and risk difference
Strongest design for demonstrating a causal association
Eliminate confounding and bias if done well
Limitations of RCTs
Many exposures cannot be randomised, and we need to have clinical equipoise (genuine uncertainty about benefit/harm of intervention)
- It is unethical to give known harmful interventions to people
- It is unethical to give interventions known to be less effective than current treatments
- It is unethical to waste resources and risk people’s well-being if already know the answer
Expensive and resource intensive
Sometimes the participants aren’t representative of the general population, therefore are not generalisable (highly selective)
Not good for rare outcomes
Exposure needs to be modifiable - if not modifiable like vaccinations then you need to do an observational study
External validity
The extent to which the findings of the study be applied to the broader population
Also known as generalisability
Judgement call is made depending on what is being studied and who it is being applied to
Internal validity
The extent to which the findings of the study are free of chance, bias and confounding
Sampling
When we do a study, we take a sample ( a subset of the source population)
The study sample gives us an ‘estimate’ value of the population parameter (e.g. population relative risk) - this is the unknown, true value of the measure that the study is trying to estimate
However we don’t always get our sampling right…
Parameter = The true value of the measure in the population that the study is trying to discover
Estimate = The measure found in the study sample. Sometimes referred to as the point estimate
Sampling error
Sampling is unlikely to be perfect due to chance
If we took lots of samples from the same source population, each sample would be different simply because of chance
This is called sampling error and is a form of random error which is commonly just called chance
Because of chance our study might not be an accurate representation of the population
Chance is a random sampling error - the most common reason for this is through a small sample sice
To reduce the likelihood of chance, we can increase the sample size, which:
- Reduces the sample variability
- Increases the likelihood of having a representative sample
- Increases the precision of the parameter estimate
Cannot eliminate sampling error but can reduce with large sample sizes
95% confidence intervals
If you repeated a study 100 times with a different random sample each time, you would get 100 different estimates and 100 different confidence intervals….In 95 of the 100 studies (95% of the time) the population parameter (e.g. RR) would lie within that study’s 95% confidence interval. 5% of the time the population parameter would lie outside of the confidence interval
In other words - we are 95% confident that our TRUE population parameter lies between the bounds of the confidence interval
STATEMENT = We are 95% confident that the parameter lies between (lower bound) and (upper bound) // We are 95% confident that the true population value lies between the limits of the confidence level
Width of the confidence interval indicates precision
Narrow confidence interval is more precise (smaller range of values that the parameter could be)
Wide confidence interval is less precise (greater range of values of what the parameter could be)
Precision is important because it helps us decide how useful a finding is
Clinical importance
RCTs often state what they consider to be a clinically important result - it is set to decide if the intervention has a real, genuine, and noticeable effect on daily life
Confidence intervals can help us decide whether the study findings are clinically important
For something to be considered clinically important, the entire confidence interval has to be below the level of clinical importance and if it is not clinically important then the confidence interval is entirely above
P-values
Probability of getting study estimate (or a study estimate further from the null), when there is really no association, because of sampling error (chance)
If probability really low, then it is unlikely that the estimate is due to sampling error
P value = the probability that our study result is WRONG (and is due to chance/sampling error)
The higher the p value, the more likely the association is due to chance (e.g. a p value of 1 - certainly due to chance)
A small P value (<0.05) - means a SMALL chance the association we have found is incorrect (due to chance) and there is no true association
A large P value (>0.05) - a LARGER chance the association we have found is incorrect due to chance and there is actually no association
If the 95% confidence interval includes the null value, then p>0.05
If the 95% confidence interval does not include the null value, then p<0.05
Statistical significance
P values tell us whether a result is statistically significant or not
P > 0.05 - not statistically significant
P< 0.05 - statistically significant
Statistical significance …. P>0.05 it is highly likely that our results are due to chance ….. P<0.05 there is a less than 5% chance that our results are due to chance
Null hypothesis
There is not true association between exposure and outcome (and that our population parameter equals the null value)
Null value for RR and OR is 1 and the null value for RD is 0
Small studies usually have a higher p value than larger studies which have a smaller p value
The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. A p-value less than 0.05 (typically ≤ 0.05) is statistically significant. It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).
Alternative hypothesis
There truly is an association between exposure and outcome (population parameter does not equal the null value)
Parameter does not equal to 1 for RR and OR and for RD the parameter does not equal to 0
Type I errors
When the study shows a statistically significant result when there truly isn’t one (null hypothesis is true)
Reduced risk of type I error when smaller P values are used
Type II errors
Don’t find a statistically significant result when there is truly one (alternative hypothesis is true and there is a statistically significant association)
More likely if our sample size is small
Incorrectly fail to reflect H0 when should have (p value should have been <0.05 but got >0.05)
Typically due to having too few people in the study
Bigger sample size = more likely to get small p
Smaller sample size = less likely to get small p
Statisticians can calculate power to find out how many participants are needed to minimise chance of Type-II error
The power of a study is the probability that it will detect an association of a particular size if it truly exists in the general population
Why are P-values problematic?
Arbitrary threshold
Is p = 0.06 that different to p=0.04?
Always useful to report P values rather than just saying if it is statistically significant or not
Only about Ho
Just give evidence about consistency with the null hypothesis
Don’t say anything about precision
Best presented with confidence intervals
Nothing about importance
Statistical significance is not clinical significance
Statistical significance depends on whether the CI crosses the null value and clinical importance depends on where the CI lies based on the clinical importance threshold
Don’t say anything about whether the results are valid, useful or correct
Absence of a statistically significant association is not evidence of absence of a real association
Statistical significance
Statistical significance
P values tell us whether a result is statistically significant or not
P > 0.05 - not statistically significant
P< 0.05 - statistically significant
Statistical significance …. P>0.05 it is highly likely that our results are due to chance ….. P<0.05 there is a less than 5% chance that our results are due to chance
Statistical significance and clinical importance
Statistical significance depends on whether the CI crosses the null value and clinical importance depends on where the CI lies based on the clinical importance threshold