1a Epidemiology Flashcards
What is routine epidemiological data?
Non-targeted information that is obtained in a standardised and consistent manner
Give some examples of routine epidemiological data?
Demographic data from census and population registers
Death certificates
Cancer registrations
Birth registrations
Congenital malformations registrations
Infectious disease notifications
Hospital episode data
Health surveys
Royal College of General Practitioners weekly returns
Where are most health statistics in England published?
The Office for National Statistics (ONS)
What is Demography?
The scientific study of population statistics, including their size, structure, dispersal and development
Why is demographic data important?
They form the baseline count of the total population being studied. Reliable denominators are necessary for the calculation of the various measures of disease frequency, including incidence and prevalence
What is the primary source of demographic data in the UK?
The national census
How often is the national census and when was the last one?
Every 10 years - 2021
What data is collected in the national census?
Population counts by age and sex
Ethnicity
Country of birth
Accommodation
Education
Employment
Long-term illness
What are the problems with the national census?
Data is collected by each of the government census agencies (England and Wales, Scotland and Northern Ireland) separately.
Infrequent data collection means data can be outdated
Data may be incomplete for some population sub-groups, for example, those in hard-to-reach communities
What are annual mid-year population estimates and how are they created?
Annual mid-year population estimates are estimated using the most recent census data but then accounting for births, deaths, net migration, and ageing of the population.
They are produced by the Office for National Statistics.
What are the disadvantages of annual mid-year population estimates?
In areas with high migration rates (e.g. Urban areas) there may be potential inaccuracies in the estimate.
How is mortality data collected in England?
Mortality information is derived from the registration of deaths certified by an attending medical practitioner or coroner (Procurator Fiscal in Scotland).
Death certificates include information on the immediate and underlying causes of death, age at death, sex, address and occupation.
Death certificates are sent to the Office for National Statistics (ONS) where the underlying causes of death are classified according to the Tenth Revision of the International Classification of Diseases (ICD-10). Resulting in a complete and continuous set of mortality data.
What mortality data is published in the UK and where is it published?
Annually mortality statistics for England and Wales, are published by the Office for National Statistics (ONS). Published reports include:
Mortality Statistics - Deaths Registered in England and Wales
Presents statistics on deaths occurring annually in England and Wales. Data includes death counts and rates by sex, age-group, and underlying cause.
Child Mortality Statistics - Childhood, Infant and Perinatal Deaths in England and Walkes
Presents detailed analyses of all stillbirths, infant and perinatal deaths and data on deaths of children < 16 years by cause of death, sex and age-group.
What are the disadvantages of mortality data in the UK?
Risk of poor diagnostic accuracy
Varied certifying experience of the attending medical practitioner
Possible incorrect classification and coding of the death certificate
What are some examples of morbidity data in the UK?
Cancer statistics registrations
National congenital anomaly and rare disease registrations
Statutory notifications of infectious disease
Laboratory reporting of microbiological data
General practitioner clinical codes (SNOMED CT)
Hospital episode statistics
Data from health surveys (for example, Health Survey for England)
Royal College of General Practitioner Research and Surveillance Centre weekly reports
How is cancer data in the UK collected and published?
Cancer registrations in England are conducted by eight independent regional registries that collect data on cancer cases in their regions. Regional registries supply a standard dataset (Cancer Outcomes and Services Dataset) monthly to the National Cancer Registration Service run by Public Health England for the provision of cancer statistics.
These data are published annually by the ONS (2 years after the year in which the cancer was diagnosed) in:
Cancer Statistics: Registration Series
Cancer Registrations Wales
Cancer Registrations Scotland
International Agency for Research on Cancer (IARC)
What are the issues with cancer data in the UK?
Cancer registries may differ with respect to methods of data collection, completeness of registrations or recording of data items.
Submission of data to the registries is voluntary.
Possible misclassification of cancer cases or changes in coding systems over time may affect the reliability of the data particularly when examining trends over time.
What are the issues with cancer data in the UK?
Care is needed when examining trends over time or between different regions. For example, cancer registries may differ with respect to methods of data collection, completeness of registrations or recording of data items.
Submission of data to the registries is voluntary.
In addition, misclassification of cancer cases or changes in coding systems over time may affect the reliability of the data, particularly when examining trends over time.
What are Statutory Notifiable Diseases and how are they collected and published in the UK?
Certain infectious diseases must be notified to the proper officer of the relevant local authority.
All diagnostic laboratories in England must notify Public Health England (PHE) when a notifiable organism is confirmed.
Reports of notifications of infectious diseases (NOIDs) are published weekly and annually by PHE.
What are vital statistics and how are they collected and published in the UK?
Vital statistics are a set of data collected about “vital” life events. including data on live/stillbirth rates, fertility rates, maternity statistics, death registrations and causes of death.
Tables for England and Wales are produced by the ONS and are available for local authorities, health authorities and wards, with raw data being held by NHS Digital.
What is The Health Survey for England?
The Health Survey for England (HSE) was established in 1991 by the Department of Health and Social Care, and is now carried out in conjunction with NatCen Social Research (an independent social research agency). It comprises a series of annual surveys about the nation’s health. The HSE is designed to be representative of the general population in England and aims to provide a measure of the health status of the population. Annual surveys cover the adult population aged 16 and older living in private households. Children have been included in the survey since 1995.
Each year the survey focuses on different demographic groups or diseases and their risk factors and looks at health indicators including cardiovascular disease, physical activity and eating habits. In addition to completing a health questionnaire, those surveyed are followed up by a nurse visit during which various physical measurements including blood pressure, lung function tests, blood and saliva are collected.
What are the strengths of routine epidemiological data?
Readily available
Low cost
Useful for establishing baseline characteristics
Identify cases in a case-control study
Generating aetiological hypotheses
Derive expected numbers in a cohort study or as a source for ascertaining outcomes in a cohort study
Useful for examining trends of disease over time and by place
What are the weaknesses of routine epidemiological data?
Lack of completeness, with potential bias
Often poorly presented and analyzed
Where there are small numbers of cases, it may be possible to identify individuals, threatening confidentiality
Data may not be collected in a uniform way across the entire population
Techniques of data collection may vary geographically, e.g. recording data, coding
Equivalent data not always available for all countries
Delay between collection and publication
What are the common patterns of disease incidence in relation to geographic place?
Variations in disease incidence by place fall under three main headings:
Broad geographical differences – sometimes related to factors such as climate, or social and cultural habits. Some cancers show marked geographical differences in incidence.
Local differences – distribution of a disease may be limited by the localisation of the cases, for example a contaminated water supply.
Variations within a single institution – variations in attack rates by hospital ward, for example, may help identify possible sources or routes of spread of a gastrointestinal infection.
What are the common patterns of disease incidence with time?
There are three broad patterns of variation in disease incidence with time:
Secular (long-term) trends – changes in disease incidence over a number of years that do not conform to an identifiable cyclical pattern. For example, the secular trend in mortality from TB in England shows a steady decline over many years. However, this does not give any indication of the cause of the decline.
Periodic changes including seasonality – regular or cyclical changes in incidence, for example in infectious diseases. Cases of influenza typically reach a peak in the winter months.
Epidemics – strictly speaking, an epidemic is a temporary increase in the incidence of a disease in a population
In what ways can individual factors impact disease incidence?
Modifiable Risk Factors
Occupation
Marital status
Behavioural habits
Lifestyle
Non-modifiable risk factors
Age
Gender
Ethnic group
What is prevelance?
Prevalence is the proportion of a population who have a disease/condition in a given time period.
There are two types of prevalence:
Point prevalence
Period prevalence
What are the types of prevalence?
Point prevalence
Period prevalence
What is point prevalence?
The proportion of existing people with a disease in a defined population at a single point in time
Point prevalence = Number of cases in a single point of time/number of persons in a defined population at a defined point in time
The point in time that point prevalence refers to should always be clearly stated. Prevalence is a proportion, so has no units.
What is the point prevalence of hypertension, in a town with 10,000 female residents on January 1st 2016, 1,000 have hypertension.
Point prevalence = Number of cases in a single point of time/number of persons in a defined population at a defined point in time
The prevalence of hypertension among women in town A on this date is calculated as:
1,000/10,000 = 0.1 or 10%
The point in time that point prevalence refers to should always be clearly stated. Prevalence is a proportion, so has no units.
What is period prevalence
Period prevalence is the number of individuals identified as cases during a specified period of time, divided by the total number of people in that population.
What is the difference between point prevalence and period prevalence?
Point prevalence is the proportion of existing people with a disease in a defined population at a single point in time
Period prevalence is the number of individuals identified as cases during a specified period of time, divided by the total number of people in that population.
What is incidence
Incidence is a measure of the number of new cases of a disease (or another health outcome) that develop in a population of individuals at risk, during a specified time period.
There are three main types of incidence:
Risk (or cumulative incidence)
Incidence (or incidence rate)
Odds
What are the types of incidence?
There are three main types of incidence:
Risk (or cumulative incidence)
Incidence (or incidence rate)
Odds
What is Risk (or cumulative incidence) and how is it calculated?
Risk (also known as cumulative incidence) refers to the occurrence of risk events, such as disease or death, in a group studied over time.
It is the proportion of individuals in a population initially free of disease who develop the disease within a specified time interval. Incidence risk is expressed as a percentage (or, if small, as “per 1000 persons”).
Risk = Number of new cases of a disease in a specified time period/Population at risk
Population at risk = The number of persons at risk but without the disease at the beginning of the time period
What assumptions are made about a risk (or cumulative incidence) population?
Cumulative incidence assumes that the population at risk is a closed population. This means that the entire population at risk is followed up for the entire specified time period for the development of the outcome under investigation.
What is a closed population in an epidemiological study?
A population where every individual is followed up for the entire specified time period and no further participants join the study population i.e. there are no dropouts or new participants
What is a dynamic population in an epidemiological study?
A population where individuals can leave or join the study population. i.e. there are dropouts or where extra individuals join.
Causes of dropouts:
Some may develop the outcome of interest
Lost during follow-up
Refusal to continue to participate in the study
Migration
Death
What is Incidence (or incidence rate or rate) and how is it calculated?
The incidence rate is one of 3 measures of incidence. It measures the frequency of new cases of a disease in a population but considers the sum of the time that each participant remained under observation and at risk of developing the outcome under investigation. This helps to account for the varying time periods of follow-up.
Incidence rate = Number of new cases in a given time period/Total person-time at risk
Total person-time at risk = The sum of each individual’s time at risk (i.e. the length of time they were followed up in the study). It is commonly expressed as person-years at risk.
The incidence rate is the rate of contracting the disease among those still at risk. When a study subject develops the disease, dies or leaves the study, they are no longer at risk and will no longer contribute to person-time units at risk.
What is person-time at risk, and when is it used?
The cumulative time spent “at risk” by individuals taking part in the study is expressed in time person years.
In a dynamic epidemiological study population, individuals in the group may have been at risk for different lengths of time, so instead of counting the total number of individuals in the population at the start of the study, the time each individual spends in the study before developing the outcome of interest needs to be calculated.
This is used to calculate the incidence rate.
What is the incidence rate of hypertension according to the results of this 5-year study?
Participant 1
Time spent in the study: 5 years
Status: Still in study
Participant 2
Time spent in the study: 4.5 years
Status: Developed hypertension
Participant 3
Time spent in the study: 3.5 years
Status: Developed hypertension
Participant 4
Time spent in the study: 1.5 years
Status: Developed hypertension
Participant 5
Time spent in the study: 3.5 years
Status: Lost to follow up
Incidence rate = Number of new cases of hypertension in the 5-year period/ Total person-time at risk during the 5 year period
Incidence rate = 3/18
Incidence rate = 0.167 per person-year (or 16.7 per 100 person-years)
What is odds and how is it calculated?
Odds is a measure of incidence.
Odds = Number of new cases of a disease in a specified time period/Number of people still disease free at the end of that time period
Instead of using the number of individuals who are disease-free at the start of the study (as is the case in incidence rate and risk rate), odds are calculated using the number disease-free at the end of the time period.
What is the difference between risk (cumulative incidence) and incidence rate (rate) in the following example?
Risk = number of (new) observed cases/number at risk (disease free) at the start
Rate = number of observed cases/person time (years) at risk
In the example, there are two deaths and a sample size of 7.
The total person time of follow-up is 2 years for individuals 1, 4 and 7; one-and-a-half years for persons 2 and 6 (person 6 was lost to follow-up after one-and-a-half years); and half a year for persons 3 and 5 (person 5 was lost to follow-up after half a year). In total this equates to 10 person years.
Risk = 2 / 7 = 0.29
Rate = 2 / (2 + 1.5 + 0.5 + 2 + 0.5 + 1.5 + 2 = 2/10 = 0.2 deaths/person-year
Note that the rate has units (cases per person per year), whereas risk does not, as it is a simple probability or proportion.
What is the difference between risk (cumulative incidence), incidence (incidence rate) and odds?
All three are measures of incidence.
Risk - The proportion of individuals in a population initially free of disease who develop the disease within a specified time interval.
Incidence Rate - The proportion of individuals in a population initially free of disease who develop the disease within a specified time interval, but also taking into account the sum of the time that each participant remained under observation and at risk of developing the outcome under investigation.
Odds - The odds of disease. Instead of using the number of individuals who are disease-free at the start of the study, odds are calculated using the number who are disease-free at the end of the time period.
What is the relationship between incidence and prevalence?
The relationship between incidence and prevalence can be expressed as;
P = ID
(P = Prevalence, I = Incidence Rate, D = Average duration of the disease)
Explanation:
If the incidence of a disease is low but the duration of the disease (i.e. the time until recovery or death) is long, the prevalence will be high relative to the incidence. An example of this would be diabetes.
Conversely, if the incidence of a disease is high and the duration of the disease is short, the prevalence will be low relative to the incidence. An example of this would be influenza.
A change in the duration of a disease, for example, the development of a new treatment that prevents death but does not result in a cure, will lead to an increase in prevalence without affecting incidence.
Fatal diseases, or diseases from which a rapid recovery is common, have a low prevalence, whereas diseases with a low incidence may have a high prevalence if they are incurable but rarely fatal and have a long duration.
How do you calculate a sex-specific mortality rate?
(The number of deaths in a specific sex group in a 1 year period/mid-year population of that sex group)*1000
How do you calculate a birth rate?
(Number of births per year/mid-year population)*1000
How do you calculate an age-specific mortality rate?
(The number of deaths in a specific age group in a 1 year period/mid-year population of that age group)*1000
How do you calculate a fertility rate?
(The number of live births in a year/Mid-year population of women aged 15-44)*1000
How do you calculate an infant mortality rate?
(The number of deaths in those < 1 year of age/number of live births in a year)*1000
How do you calculate a perinatal mortality rate?
(The number of deaths in those < 1 year of age plus the number of stillbirths/number of live and stillbirths in a year)*1000
How do you calculate a neonatal mortality rate
(The number of deaths in those under 28 days in a year/number of live births in a year)*1000
How do you calculate a case fatality rate
(The number of deaths in a year from a specific disease/number of cases of that disease)*1000
What are measures of effect size?
Measures of effect are used in epidemiological studies to assess the strength of an association between a risk factor and the subsequent occurrence of disease. This is done by comparing the incidence of disease in a group of persons exposed to a potential risk factor with the incidence in a group who have not been exposed.
Measures of effect size can be relative or absolute.
What are the different measures of effect size?
Relative measures (also called measures of relative risk):
Risk ratio
Rate Ratio
Odds ratio
Absolute measures:
Attributable Risk (Risk Difference)
Attributable Risk Percentage
What are the measures of incidence, prevalence, effect size and population impact?
Incidence:
Risk (cumulative incidence)
Incidence (incidence rate)
Odds
Prevalence:
Point prevalence
Period prevalence
Effect Size:
Relative measures (also called relative risk measures):
Risk ratio
Rate Ratio
Odds ratio
Absolute measures:
Attributable Risk (Risk Difference)
Attributable Risk Percentage
Population impact:
Population-attributable risk/rate
Population attributable risk fraction
What is the difference between relative and absolute measures of effect size?
Relative measures reflect the increase in the frequency of a disease in one population (e.g. exposed) versus another (e.g. not exposed), which is treated as the baseline.
Absolute measures indicate exactly what impact a disease will have on a population, in terms of numbers or proportion affected by being exposed.
For example, a study finds that having several CT head scans in childhood results in a three-fold increase of your risk of developing brain cancer as an adult. This sounds like a large increase, but because the absolute risk increase would be small (say, an increase of 0.5 cases per 10,000 children), the increased risk means one additional case of brain cancer per 20,000 children scanned.
What are measures of relative risk?
Measures of relative risk is the collective name given to the measures of relative effect size.
This includes risk ratio, rate ratio and odds ratio.
How do you interpret relative measures of effect size (also called relative risk)?
Measures of effect, such as the risk ratio, provide assessments of aetiological strength, or the strength of association between a risk factor and an outcome.
Relative risk of 1: The incidence of disease in the exposed and unexposed groups is identical. I.e. there is no association observed between the disease and risk factor/exposure.
Relative risk >1: The risk of disease is greater among those exposed and indicates an increased risk among those exposed to the risk factor compared with those unexposed (also called positive association).
Relative risk <1: The risk of disease is lower among those exposed and indicates a decreased risk among those exposed to the risk factor compared with those unexposed (also called negative association).
What is the difference between risk ratio, rate ratio and odds ratio?
They are all relative measures of effect size, however, each uses a different measure of incidence to measure the difference between groups (risk, incidence rate and odds)
Risk Ratio (aka relative risk): The risk of developing disease in the exposed group divided by risk in the unexposed group
Rate Ratio: The ratio of the rate of an event in one group (exposure or intervention) to that in another group (control).
Odds ratio: The odds of an event (e.g. disease) occurring given a certain exposure vs. the odds of an event in the absence of that exposure.
How to calculate a risk ratio?
Risk = Number of new cases of a disease in a specified time period/Population at risk.
To calculate a risk ratio you simply calculate the ratio of risks between the exposed group and the unexposed group.
This can be done using a 2x2 contingency table:
Outcome No Outcome Total Exposure. a b. a+b No Exposure. c d. c+d Total a+c b+d
Risk Ratio:
(a/a+b)
———
(c/c+d)
How do you calculate an odds ratio?
Odds = Number of new cases of a disease in a specified time period/Number of people still disease-free at the end of that time period
To calculate an odds ratio you simply calculate the ratio of risks between the exposed group and the unexposed group.
This can be done using a 2x2 contingency table:
Outcome No Outcome Total Exposure. a b. a+b No Exposure. c d. c+d Total a+c b+d
Odds Ratio:
(ad)
———
(bc)
What is the difference between Attributable Risk (Risk Difference) or Attribute rate (rate difference), and how do you know which to use?
The attributable risk and attributable rate are both measures of absolute effect.
They tell us exactly how many more people are affected in the exposed group, than in the unexposed. Giving the result in terms of the excess risk (or rate) caused by the exposure in the exposed group.
Attributable risk =Incidence risk in exposed-incidence risk in unexposed
Attirubutal rate = Incidence rate in exposed-incidence rate in unexposed
Which you pick will depend on the study design used, as well as whether the person-time at risk is known (as this is needed to calculate the rate).
What is attributable risk and how do you calculate it?
The attributable risk (AR) is a measure of absolute effect.
It tells us exactly how many more people are affected in the exposed group, than in the unexposed. Giving the excess risk caused by exposure in the exposed group
Attributable risk =Incidence risk in exposed-incidence risk in unexposed
For example, in a cohort study, the AR is calculated as the difference of incidence risks.
An AR indicates the number of cases of the disease among the exposed that can be attributed to the exposure.
What is an attributable rate and how do you calculate it?
The attributable rate is a measure of absolute effect.
It tells us exactly how many more people are affected in the exposed group, than in the unexposed. Giving the excess rate caused by exposure in the exposed group.
Attributable rate =Incidence rate in exposed-incidence rate in unexposed
For example, in a cohort study, the AR is calculated as the difference in incidence rates.
An AR indicates the number of cases of the disease among the exposed that can be attributed to the exposure.
What is the attributable risk percentage (aka attributable fraction)?
The attributable risk percentage expressed the attributable risk in terms of the proportion of disease cases in the exposed group attributable to the exposure. This can be given as a fraction (attributable fraction) or percentage (attributable percentage).
I.e. the proportion of additional cases in the exposed group.
Attributal risk percentage = ((Risk in exposed group - Risk in unexposed group)/Risk in exposed group)*100
What are measures of population impact and what are the different types?
Measures of population impact estimate the expected impact (i.e. extra disease) in a population that can be attributed to the exposure.
There are two main measures of population impact:
The population attributable risk
The population attributable risk fraction
What are the uses of measures of population impact?
Measures of population impact can:
Estimate how much of the disease in the population is caused by the risk factor
Estimate the expected impact on a population of removing or changing the distribution of risk factors in that population
Compare the population and unexposed (in comparison to measures of effect size which compare the exposed and unexposed)
What are the types of measure of Population Impact
Population attributable risk/rate
Population attributable risk fraction
What is Population attributable risk/rate and how is it calculated?
The Population attributable risk/rate (PAR) is a measure of population impact.
The population attributable risk (PAR) is the absolute difference between the risk (or rate) in the whole population and the risk (or rate) in the unexposed group.
It is used to estimate the excess rate of disease in the total study population that is attributable to the exposure. It provides a measure of the public health impact of the exposure in the population (assuming that the association is causal).
PAR = Risk (or rate) in the total population - Risk (or rate) in the unexposed
Population attributable risk fraction
The population attributable risk fraction (PAF) is a measure of population impact.
The PAF is the proportion of all cases in the whole study population (exposed and unexposed) that may be attributed to the exposure, as follows:
PAF = Population attributable risk/overall rate in the total population
What are the problems with measures of population impact?
They assume that all of the association between the risk factor and disease is causal.
The results can vary according to how common exposure to the risk factor is in the population.
What is standardisation of data and why is it done?
The comparison of crude mortality or morbidity rates is often misleading because the populations being compared may differ significantly with respect to certain underlying characteristics, such as age, sex, race or socio-economic status.
For example, an older population will have a higher overall mortality rate when compared to a younger population.
In reality, crude overall rates are simply a weighted average of the individual category-specific rates within a population. As such, where a locality has a large elderly population, the older age categories will carry greater weight than the younger age categories, giving the impression that the death rate in this area is unacceptably high, particularly in comparison with a youthful town (e.g. a university town).
Standardisation adjusts the crude overall rates to allow for a direct comparison.
What are the methods of data standardisation?
There are three main methods of standardisation commonly used in epidemiological studies. These include:
Present and compare the age-specific rates (or whichever variable you want to standardise)
Direct Standardisation
Indirect Standardisation
What is the category-specific rate method of standardising data?
The category-specific rate method of standardising data is a way of standardizing data and results so that they are directly comparable.
This simply involves presenting and comparing the age-specific rates.
What are the strengths and weaknesses of the category-specific rate method of standardising data and what are the alternative options?
Strengths:
Simple
Quick to do
Allows for a more comprehensive comparison of mortality or morbidity rates between two or more populations
Weaknesses:
As the number of stratum-specific rates being compared increases, the volume of data being examined may become unmanageable.
Alternatives:
It may therefore be more useful to combine category-specific rates into a single summary rate that has been adjusted to take into account the population’s age structure or another confounding factor. This is achieved by using direct or indirect methods of standardisation.
What is the difference between direct and indirect standardisation and how do you know which to use?
Direct and indirect standardisation are the two main methods of standardisation.
Direct standardisation uses the category-specific rates (for example age-specific mortality) from both populations and applies these to a standard reference population. This allows you to work out what the mortality rate for this reference population would be, based on each population’s mortality rates, and you can then compare these numbers. The ratio of two directly standardised rates is called the Comparative Incidence Ratio or Comparative Mortality Ratio.
In indirect standardisation, you do the reverse. You find a reference standard of category-specific rates (for example age-specific mortality rates), and calculate what their expected mortality rate should be by applying these standard values to the populations in question. This calculated expected rate can then be compared with the overall observed rates. The ratio of two indirectly standardised rates is called the Standardised Incidence Ratio or the Standardised Mortality Ratio.
In general, direct standardisation is used when category-specific rates for both sets of data (e.g. age-specific rates) are available, and the indirect method is used when category-specific rates are unavailable. Indirect standardisation is also more appropriate for use in studies with small numbers or when the rates are unstable.
What is direct standardisation?
Direct standardisation is one of the two main types of standardisation.
The direct method of standardisation produces ‘age-adjusted rates’ that are derived by applying the category-specific mortality rates of each population to a single standard population. This ‘standard population’ may be the distribution of one of the populations being compared or may be an outside standard population such as the European Standard Population or the WHO’s World Standard Population.
What is indirect standardisation?
Indirect standardisation is one of the two main types of standardisation.
In indirect standardisation, you take a known set of category-specific rates (from either one of the populations being compared, or from a standard population) and apply these to the structure of each of the populations being compared.
This calculated expected rate can be compared with the overall observed rates to give a standardised morbidity/mortality ratio (SMR). Note that the SMR is always expressed as a percentage.
What are the steps of indirect standardisation?
1) Identify a standard reference for category-specific death rates, either from a reference or from one of the populations if you have this available.
2) Calculate the expected numbers of stratum-specific expected deaths.
3) Calculate the total number of expected deaths by summing the number of expected deaths in each stratum.
4) Calculate the standardised mortality rate (SMR) – the ratio between the observed and expected number of deaths (always expressed as a percentage)
Use indirect standardisation to compare the age-standardised mortality rate for both countries, how do you interpret the results?
Country A:
Age group: (0-29) Number of deaths: (7000) Population: (6,000,000) Rate per 1000: (1.2)
Age group: (30-59) Number of deaths: (20,000) Population: (5,500,000) Rate per 1000: (3.6)
Age group: (60+) Number of deaths: (120,000) Population: (2,500,000) Rate per 1000: (4.8)
Total: Number of deaths: (147,000) Population: (14,000,000) Rate per 1000: (10.5)
Country B:
Age group: (0-29) Number of deaths: (6300) Population: (1,500,000) Rate per 1000: (4.2)
Age group: (30-59) Number of deaths: (3000) Population: (550,000) Rate per 1000: (5.5)
Age group: (60+) Number of deaths: (6000) Population: (120,000) Rate per 1000: (50)
Total: Number of deaths: (15,300) Population: (2,170,000) Rate per 1000: (7)
Hypothetical Standard Population:
0-29 - 100,000
30-59 - 65,000
60+ - 20,000
Total - 185,000
1) Identify a standard reference for category-specific death rates, either from a reference or from one of the populations if you have this available.
While a reference is not given in the question, you are able to use one of the countries as your reference (in this case we use country A)
0-29: 0.0012
30-59: 0.0036
60+: 0.048
2) Calculate the expected numbers of stratum-specific expected deaths.
Country A:
0-29: 0.0012 x 6,000,000
= =7,200
30-59: 0.0036 x 5,500,000
=19,800
60+ : 0.048 x 2,500,000
=120,000
Country B:
0-29: 0.0012 x 1,500,000
= =1,800
30-59: 0.0036 x 550,000 = 1,980
60+: 0.048 x 120,000 = 5,760
3) Calculate the total number of expected deaths by summing the number of expected deaths in each stratum.
Country A = 7,200 + 19,800 + 120,000 = 147,000
Country B = 1,800 + 1,980 + 5,760 = 9540
4) Calculate the SMR – the ratio between the observed and expected number of deaths. This needs to be in percentage form.
Country A:
(Observed(147,000)/Expected (147000))100 = 100%
Country B:
(Observed(15,300)/Expected(9540))100 = 160%
Interpretation:
The number of observed deaths in Country B is 60% higher than what we would expect if Country B had the same mortality experience as Country A.
What are the steps of direct standardisation?
1) Identify a standard population for which relevant stratum-specific data are available
2) Calculate the number of stratum-specific expected deaths for each data set.
3) Calculate the total number of expected deaths by summing all the values from the stratum-specific calculations.
4)Calculate the age-standardised rate by dividing the total number of expected deaths by the total standard population size.
5) Calculate the Comparative Mortality Ratio (CMR).
Use direct standardisation to compare the age-standardised mortality rate for both countries, how do you interpret the results?
Country A:
Age group: (0-29) Number of deaths: (7000) Population: (6,000,000) Rate per 1000: (1.2)
Age group: (30-59) Number of deaths: (20,000) Population: (5,500,000) Rate per 1000: (3.6)
Age group: (60+) Number of deaths: (120,000) Population: (2,500,000) Rate per 1000: (4.8)
Total: Number of deaths: (147,000) Population: (14,000,000) Rate per 1000: (10.5)
Country B:
Age group: (0-29) Number of deaths: (6300) Population: (1,500,000) Rate per 1000: (4.2)
Age group: (30-59) Number of deaths: (3000) Population: (550,000) Rate per 1000: (5.5)
Age group: (60+) Number of deaths: (6000) Population: (120,000) Rate per 1000: (50)
Total: Number of deaths: (15,300) Population: (2,170,000) Rate per 1000: (7)
Hypothetical Standard Population:
0-29 - 100,000
30-59 - 65,000
60+ - 20,000
Total - 185,000
Step 1) Identify a standard population for which relevant stratum-specific data are available
This is given to you in the question
Step 2) Calculate the number of stratum-specific expected deaths for each data set.
For each age stratum of each population being compared, multiply the age-specific mortality rate by the size of the standard population for that stratum. This gives you the number of deaths one would expect in the standard population if it had the same mortality rates as your study population.
Country A:
0.0012 x 100,000 = 120
0.0036 x 65,000 = 234
0.048 x 20,000 = 960
Country B
0.0042 x 100,000 = 420
0.0055 x 65,000 = 357.5
0.05 x 20,000 = 1,000
Step 3) Calculate the total number of expected deaths by summing all the values from the stratum-specific calculations, above. This gives the total number of deaths that would be expected in the standard population if it had the same mortality rate as your study population.
Country A:
120 + 234 + 960 = 1314
Country B
420 + 357.5+1000 = 1777.5
Step 4) Calculate the age-standardised rate by dividing the total number of expected deaths by the total standard population size.
Country A:
1,314/185,000 = 7.1 per 1,000 pyrs
Country B:
1,777.5/185,000 = 9.6 per 1,000 pyrs
Interpretation:
After controlling for the confounding effects of age, the mortality rate in Country B is 35% higher than in Country A.
What is the Comparative Mortality Ratio (CMR)?
The Comparative Mortality Ratio (CMR) is the ratio of two directly standardised rates. It gives a single summary measure that reflects the difference in mortality between the two populations.
It is calculated by dividing the overall age-standardised rate in, say, country B by the rate in country A.
What is a standardised mortality rate (SMR).
A ratio between two indirectly standardised rates. It gives a single summary measure that reflects the difference in mortality between the two populations. It is always expressed as a percentage.
What is the comparative mortality ratio between country A (with an age-standardised mortality rate of 9.6) and country B (with an age-standardised mortality rate of 7.1)
Comparative Mortality Ratio = 9.6/7.1 = 1.35
Use the data below to explain the role that standardisation plays in data comparison.
Country A:
Age group: (0-29) Number of deaths: (7000) Population: (6,000,000) Rate per 1000 person-years: (1.2)
Age group: (30-59) Number of deaths: (20,000) Population: (5,500,000) Rate per 1000 person-years : (3.6)
Age group: (60+) Number of deaths: (120,000) Population: (2,500,000) Rate per 1000 person-years: (4.8)
Total: Number of deaths: (147,000) Population: (14,000,000) Rate per 1000 person-years: (10.5)
Country B:
Age group: (0-29) Number of deaths: (6300) Population: (1,500,000) Rate per 1000 person-years: (4.2)
Age group: (30-59) Number of deaths: (3000) Population: (550,000) Rate per 1000 person-years: (5.5)
Age group: (60+) Number of deaths: (6000) Population: (120,000) Rate per 1000 person-years: (50)
Total: Number of deaths: (15,300) Population: (2,170,000) Rate per 1000 person-years: (7)
The overall crude mortality rate is higher for country A (10.5 deaths / 1,000 person-years) compared with country B (7 deaths / 1,000 person-years), despite the age-specific mortality rates being higher among all age groups in country B.
The reason for the differences is that these two populations have markedly different age structures. Country A has a much older population than Country B. For example, 18% of the population in country A are aged over 60 years compared with just 5.5% of the population in country B.
Standardisation allows us to compare these populations and see what the adjusted mortality rate, taking into account these population differences, will be.
What are the issues of data standardisation?
Standardised rates are used for the comparison of two or more populations; they represent a weighted average of the age-specific rates taken from a ‘standard population’ and are not actual rates.
Certain data is required to perform standardisation. For example, the direct method of standardisation requires that the age-specific rates for all populations are available and the indirect method of standardisation requires the total population size for each category.
As the choice of a standard population will affect the comparison between populations, it should always be stated clearly which standard population has been applied.
What are the different types of data?
Categorical:
Nominal
Ordinal
Binary
Numeric:
Discrete
Continuous
What is nominal data?
A type of categorical data without an order.
Examples include blood groups (O, A, B, AB), eye colour and marital status.
What is ordinal data?
A type of categorical data, where categories have an innate order in which they can be ranked. The “distances” between the different groups can be variable.
Examples include stages of breast cancer.
What is binary data?
Binary, or dichotomous, data is a type of categorical data where there are only two possible outcomes.
Examples include Yes/No or True/False or “survived” and “not survived”.
What is discrete data?
Discrete data is a type of numerical data.
It can only take fixed values. Examples include shoe size or number of people.
What is continuous data?
Continuous data is a type of numerical data.
It can take any value, frequently within a given range. Examples include weight and length (where the range would be from zero to, theoretically, infinity).
What are the different types of data scale and what do they mean?
Nominal - Naming variables in no particular order e.g. Eye colour
Ordinal - Ranking variables with an inherent order e.g. Breast Cancer Staging
Interval - Ranking variables with a set distance between each group e.g. An example is temperature measured in degrees Celsius. The difference between 10°C and 20°C is the same as the difference between 30°C and 40°C – so the differences are meaningful. However, the 20°C is not twice as hot as 10°C, so the ratios are not meaningful.
Ratio - Ranking variables on a scale with measurable intervals. Ratio data have a true zero and both differences and ratios are meaningful. An example is weight. The difference between 1kg and 2kg is the same as the difference between 3kg and 4kg. In addition, 2kg is twice as much as 1kg, and 10kg is twice as much as 5kg – so ratios are meaningful.
What is years of life lost (YLL)
A summary measure of premature mortality.
It estimates the years of potential life lost due to premature deaths taking into account the age at which deaths occur, giving greater weight to deaths at a younger age and lower weight to deaths at an older age.
What are the uses of “Year of life lost”
You can calculate the YLL of a specific cause of death as a proportion of the total YLL lost in the population due to premature mortality.
This can be used in public health planning to:
Compare the relative importance of different causes of premature deaths within a given population
Set priorities for prevention
Compare the premature mortality experience between populations.
How is “Years of life lost” calculated?
Summing the number of deaths at each age between 1-74 years, multiplied by the number of years of life remaining up to the age of 75 years.
The number of years of life remaining upper approximates life expectancy in a given population and any upper age limit could potentially be used.
Deaths at age <1 year are excluded as they are often related to causes originating in perinatal period such as congenital anomalies or prematurity.)
How would you calculate the “Years of life lost” contribution for 10 children who died at the age of 1 year?
Number of deaths at the age of 1 year x The number of years lost had each individual lived to the age of 75
= 10 x 74 years
= 740 years
What is the “Crude years of life lost rate” and how is it calculated?
An expression of the years of life lost value that is given in comparison to the total population aged under 75 years.
Effectivley converting YLL into a value per X persons
How do you calculate the crude years of life lost rate?
Crude Years of life lost rate = (Years of life lost/population under 75 years) x 10,000
What is disease burden?
Disease burden is the impact of a health problem on a given population.
It can be measured using a variety of indicators such as mortality, morbidity or financial cost.
Measuring this allows the burden of disease to be compared between different areas, for example, regions, towns or electoral wards.
What are the different measures of disease burden?
Multiple different measures can be used to monitor disease burden, such as mortality, morbidity or “years of life lost”.
However, the two best measures are:
Quality-Adjusted Life-Years (QALY)
Disability-Adjusted Life-Years (DALY)
These two are best as they allow direct comparison of the burden of different diseases and take into account both death and morbidity in a single measure.
What are Quality-Adjusted-Life Years (QALY)?
Quality-Adjusted Life-Years (QALY) are a measure of the life expectancy corrected for the loss of quality of that life caused by diseases and disabilities.
QALY take into account both quantity and the quality of life generated by a healthcare intervention.
A year of life in perfect health is given a QALY of 1 whilst a year of complete functional impairment (e.g. death) has a QALY of 0.
What are Disability-Adjusted Life-Years (DALY)?
Disability-Adjusted Life-Years (DALY) reflect the potential years of life lost due to premature death (YLL) and equivalent years of ‘healthy’ life lost by virtue of being in states of poor health or disability. These disabilities can be physical or mental.
One DALY can be thought of as one lost year of a ‘healthy’ life.
What is the Global Burden of Disease Study?
The most well-known assessment of disease burden is the Global Burden of Disease (GBD) Study carried out by the World Health Organisation.
This is a regularly updated study that looks to provide age- and sex-stratified estimates of the burden of 333 leading causes of death and disability globally and for 195 countries and regions.
Started in 1990 and was most recently done in 2016.
What is the purpose of measuring disease burden?
Prioritising actions in health and the environment
Planning for preventive action
Assessing performance of healthcare systems
Comparing action and health gain
Identifying high-risk populations
Planning for future needs
Setting priorities in health research
What are the causes of variation in an epidemiological study?
Measurements Errors:
Measurement Error
Random error (chance)
Systematic error (bias)
Misclassification (Information bias)
Sampling Errors
What is measurement error?
One of the causes of variation within a study.
Measurement error is the variability in a study caused by a lack of validity or reliability in the method of measurement of your inputs or outputs in a study.
For example, this could include having a broken blood pressure cuff or instead having a tester who did not know how to properly use one.
What is validity?
The degree to which an instrument is capable of accurately measuring what it intends to measure. For example, how well a questionnaire measures the exposure or outcome in a prospective cohort study, or the accuracy of a diagnostic test.
There are 4 main types of validity:
Construct validity
Content validity
Face validity
Criterion validity
Assessing validity requires that an error-free reference test or ‘gold standard’ is available to which the measure can be compared.
What are the types of validity?
There are 4 main types of validity:
Construct validity
Content validity
Face validity
Criterion validity
What is construct validity?
The extent to which the instrument specifically measures what it is intended to measure, and avoids measuring other things.
For example, a measure of intelligence should only assess factors relevant to intelligence and not, for instance, whether someone is a hard worker. Construct validity subsumes the other types of validity.
What is content validity?
Content validity describes whether an instrument is systematically and comprehensively representative of the trait it is measuring. For example, a questionnaire aiming to score anxiety should include questions aimed at a broad range of features of anxiety.
What is face validity?
Face validity is the degree to which a test is subjectively thought to measure what it intends to measure. In other words, does it “look like” it will measure what it should do. The subjective opinion for face validity can come from experts, from those administering the instrument, or from those using the instrument.
What is criterion validity?
Criterion validity involves comparing the instrument in question with another criterion which is taken to be representative of the measure. This can take the form of concurrent validity (where the instrument results are correlated with those of an established, or gold standard, instrument), or predictive validity (where the instrument results are correlated with future outcomes, whether they be measured by the same instrument or a different one).
How do you assess validity?
Validity is measured by sensitivity and specificity.
These can be calculated by two main methods
Comparing the test with the best available clinical assessment. For example, a self-administered psychiatric questionnaire may be compared with the majority opinion of an expert psychiatric panel.
Test its ability to predict some other relevant finding or event, such as the ability of glycosuria (glucose in the urine) to predict an abnormal glucose tolerance test, or of a questionnaire to predict future illness.
The above methods can then be plotted in a 2x2 contingency table, classifying positive or negative for the outcome, first on the basis of the survey or new instrument, and then according to the reference test.
Ref Test Pos Ref Test Neg Total New Test Pos. a b. a+b New Test Neg. c d. c+d Total a+c b+d
Sensitivity (a/a+c) - a sensitive test detects a high proportion of the true cases
Specificity (d/b+d) - a specific test has few false-positives
Systematic error (a+b)/(a+c) - the ratio of the total numbers positive from the new test compared with the reference tests. This indicates the proportion of counts that were correct.
Positive predictive value - the proportion of test positives that are truly positive.
How can the validity of a test be improved?
Training observers and considering the setting of observation
Ensure an appropriate and representative sample, and consider the effect of reflexivity (the effect of observation and the observer on participants)
Ensure the results of observations are accurately recorded, for example by having two observers, or by recording spoken responses
Triangulate responses by repeating observations, or by assessing the outcome of interest with additional instruments
What is Reliability?
Reliability, also known as reproducibility, refers to the consistency of the performance of an instrument over time and among different observers.
A highly reliable measure produces similar results under similar conditions so, all things being equal, repeated testing should produce similar results.
There are 4 main methods of testing the reliability of an instrument:
Inter-rater (or inter-observer) reliability
Intra-rater (or intra-observer) reliability
Inter-method reliability
Internal consistency reliability
What is Inter-rater (or inter-observer) reliability?
The degree of agreement between the results when two or more observers administer the instrument on the same subject under the same conditions.
Inter-rater reliability can be measured using Cohen’s kappa (k) statistic. Kappa indicates how well two sets of (categorical) measurements compare.
What is Intra-rater (or intra-observer) reliability and how is it measured?
Also called repeatability or test-retest reliability
This describes the agreement between results when the instrument is used by the same observer on two or more occasions (under the same conditions and in the same test population).
What is Cohen’s kappa (k) and how is it interpreted?
Cohen’s kappa (k) is a measure of inter-rater reliability.
It is more robust than simple percentage agreement as it accounts for the possibility that a repeated measure agrees by chance.
Kappa values range from -1 to 1, where values ≤0 indicate no agreement other than that which would be expected by chance, and 1 is perfect agreement.
Values above 0.6 are generally deemed to represent moderate agreement.
What are the disadvantages of Cohen’s Kappa (k)?
It can underestimate agreement for rare outcomes
It requires the two raters to be independent
What is Inter-method reliability?
Also known as equivalence.
This is the degree to which two or more instruments, that are used to measure the same thing, agree on the result.
How can a test’s reliability be improved?
Training of observers
Clear definitions of terminology, criteria and protocols
Regular observation and review of techniques
Identifying causes of discrepancies and acting on them
How are reliability and validity related?
What may be valid for a group or a population may not be so for an individual in a clinical setting. When the reliability or repeatability of the test is poor, the validity of the test for a given individual may also be poor.
What is Internal consistency reliability and how is it measured?
This is the degree of agreement, or consistency, between different parts of a single instrument.
Internal consistency is measured using Cronbach’s alpha (α) – a statistic derived from pairwise correlations between items that should produce similar results.
What is generalisability?
Also known as external validity.
The extent to which the findings of a study can be applicable to other settings.
What makes a result generalisable?
To be generalisable, the results must have a suitable level of internal validity, after that, it is a judgement based off of:
The characteristics of the participants (including the demographic and clinical characteristics, as affected by the source population, response rate, inclusion criteria, etc.)
The setting of the study
The interventions or exposures studied
What elements make a study less generalisable?
Restrictions within the original study (eligibility criteria),
Pre-test/post-test effects (where cause-effect relationships within a study are only found when pre-tests or post-tests are also carried out).
What is Cronbach’s alpha (α) and how is it interpreted?
Cronbach’s alpha (α) is a measure of internal consistency.
The usual range for the alpha will be zero to one, with values above 0.7 generally deemed acceptable, and a figure of one indicating perfect internal consistency.
A negative value will occur if the choice of items is poor and there is an inconsistency between them, or the sampling method is faulty.
What is random error?
One of the causes of variation within a study.
Random error (also called chance) is the variation in the study caused by chance differences between the recorded and true values.
These variations may arise from unbiased measurement errors (e.g. weight of an individual can vary between measurements due to limited precision of scales) or biological variation within an individual (e.g. blood pressure or body temperature, which are likely to vary between measurements).
What is systematic error?
One of the causes of variation within a study.
Systematic error (also called bias) is the variation in s study caused by the consistent difference between the recorded value and the true value in a series of observations which results in some individuals being systematically misclassified.
For example, if the height of an individual is always measured when the person is wearing the same shoes, the measurement will be consistent, but the results will have a systematic bias.
What is misclassification in relation to variation within a study?
One of the causes of variation within a study.
Misclassification, also called information bias, refers to variation in a study caused by the misclassification of an individual, value or attribute into a category other than that to which it should be assigned.
This misclassification can be either differential or non-differential.
What is non-differential misclassification and what does it do to your results?
Non-differential (random) misclassification is a type of misclassification.
This involves the misclassification of variables with equal probability in all study participants, regardless of the groups being compared. That is, the probability of exposure being misclassified is independent of disease status and the probability of disease status being misclassified is independent of exposure status.
Non-differential misclassification increases the similarity between the exposed and non-exposed groups, and may result in an underestimate (dilution) of the true strength of an association between exposure and disease.
What is differential misclassification and what does it do to your results?
Differential (non-random) misclassification occurs when the proportion of subjects being misclassified differs between the study groups.
That is, the probability of exposure being misclassified is dependent on disease status, or the probability of disease status being misclassified is dependent on exposure status.
The direction of bias arising from differential misclassification may be unpredictable but, where it is known and quantifiable, differential misclassification may be compensated for in the statistical analysis.
Differential misclassification may be introduced in a study as a result of:
Recall bias (differences in the accuracy of recollections by study participants)
Observer/interviewer bias
What is the difference between differential and non-differential misclassification?
In non-differential misclassification, the misclassification is the same between the study groups, whereas in differential misclassification the two groups are misclassified differently.
Non-differential misclassification only results in an underestimation of the study results whereas differential misclassification may result in an under- or overestimation of the true association.
Differential misclassification is considered to be worse than Non-differential misclassification.