Epidemiological data collection and analysis Flashcards

1
Q

How is research needed in public health?

A

Public health is of high societal value, is a collective responsibility and is at the heart of all human activities and progress

Health research relies on processes such as:

  • Systematic and timely assessment of disease
  • Developing novel, efficient and targeted strategies in order to protect and improve public health
  • Evaluating decisions and priorities in promoting public health - that is to:
  • Prevent
  • Control
  • Eradicate disease

Continuous monitoring of population health allows the key policy makers who are involved in the decision-making processes to consider the health impacts in relationship with other societal factors including economics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the questions addressed in public health?

A

What is the disease and its frequency in the population?

Why is this happening - what is the source/cause of the disease?

  • Emerging due to an infectious agent
  • Consequence of long term exposures

Where is it happening – locally or widely spread?

What is the dynamics of the disease?

Which groups in the populations are mostly affected?

What are the modifiable factors which could lead to interventions to prevent disease?
Disease occurrence – an intertwined result of
Inherited predisposition
Environmental exposures
Life-style and socio-economic circumstances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which data types are essential in epidemiology?

A
  • •Understanding population health is based on information collected on individuals or on groups
  • Informative individual level data
    • Demographics (age, gender, ethnicity),
    • socio-economic circumstances (measures of poverty etc)
    • presence of disease
    • access to education and health care
    • health facilities records
  • •Information on aggregated level data (ecological data) – surveillance processes
    • Counts on groups in the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is data processed?

A

Data are processed using

epidemiological concepts which provide measures for disease and its associations with various risk factors

statistical techniques to generate numbers for these measures and associations as well as the uncertainty around these numbers

–on rigorously collected data and thoroughly thought designs - statistics use samples in the population to provide generalizable results at population level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is epidemiology?

A

•– the study of the distribution and determinants of health related events in specified populations which serves as to control, prevent and eradicate disease

  • Quantitative - not a clinical science, unified umbrella for measuring disease
  • Object of study – population not the individual
  • Fundamental concepts: disease and exposure (harmful or protective)
  • Absolute measures= the disease in the population
  • Relative measures the disease - associations of disease with exposures or with groups in the population (relative measures)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the differences between absolute measures of the disease and relative measures of the disease?

A

Absolute measures the disease in the population

  • Prevalence, risk , odds – static measures of the disease
  • Incidence, rates – dynamic measures of the disease
  • Transmission – infectious diseases specific

Relative measures the disease - associations of disease with exposures or with groups in the population (relative measures)

  • Relative risk/rates/odds of disease
    • exposed vs. non-exposed
    • one group vs. another defined by demographic factors such as ethnicity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an example of an epidemiological study?

A

the 1854 London Cholera Outbreak – DR John Snow

  • The spatial clustering of cholera deaths around the Broad Street well provided strong evidence in support of his theory that cholera was a water-borne disease.
  • Echoes modern GIS analysis

Intersection of Broadwick and

Lexington Streets – W1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the essential concepts of epidemiological studies

A
  • Individual = unit of interest in clinical medicine
  • Population = concept used in epidemiology in reference to large groups which share similarities –people, animals or even plants

E.g.:

  • Children, infants, neonatal population
  • Population of a county/country/continent/world
  • HIV population, TB population, cancer population
  • Tree population, fish population

Exposure = concept used in reference to a harmful or protective element which can potentially influence health status.

  • environmental (air pollution, sun’s rays, clean water/air)
  • chemical (food, water or treatment)
  • biological (genetic inheritance)
  • social (poverty or wealth)
  • life style – moderate exercise intake
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Scope of epidemiology

A

•Deriving averages in the population which serve public health policy makers

•Questions addressed at population level

–What is the problem and its frequency

–Who is affected – what is the population?

–Which groups in the population are mostly affected

–When/Where does the problem manifest

–Why does it occur?

•Health status in the population is subjected to dynamic changes in various circumstances

–Hence the need for constant monitoring

•Epidemiology = obtain, interpret and use data information to promote health

–Prevent (e. g. screening/vaccination)

–Reduce (interventions/therapies)

–Control, contain (contingency measures) and eventually eradicate

•Understanding disease-exposure patterns are essential for decision making bodies to draw efficient and targeted guidelines at population level. In simple words, that is allocating money where are mostly needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are statistics used in epidemiology?

A

•Statistics in population health research translate epidemiological measures into numbers and associated uncertainties. They can be results of

–Hypothesis testing – for various type of data

–Groups comparisons relative to health outcomes and exposures

–Advanced statistical modelling tailored to the nature of the outcomes to investigate patterns of disease in the population

•From a sample in the population (data at hand) to generalizable results and predictions to the population the sample is selected from

•Epidemiology uses data and statistics to provide answers to health related questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is statistics?

A
  • part of mathematical sciences and have applications in many fields including health, economics, biological science, business, finances, etc.
  • Statistics in population health research translate epidemiological measures into numbers and associated uncertainties. They can be results of

–Hypothesis testing – for various type of data

–Groups comparisons relative to health outcomes and exposures

–Advanced statistical modelling tailored to the nature of the outcomes to investigate patterns of disease in the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are summary statistica?

A

•Summary statistics

–A succinct and relevant assessment on variables at hand (the sample we work on)

–Recognising variable types is crucial as they are summarized and processed differently

–Variety of statistical software analyses data - users need to be concerned with data types and layers of dependencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the different data types in statistics?

A

•Data types

–Variables of interest are often called outcomes or response or dependent variables in statistics (medicine/clinical sciences every measurement is usually called an outcome)

•Quantitative

–Continuous (weight, height, age, blood pressure)

–Discrete (number of accidents/week, monthly number of deaths)

•Qualitative

–Ordinal (severity of a diseases, Likert scale of agreement)

–Nominal (gender, ethnicity, presence/absence of disease, blood group)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the differences between quantitative and qualitative variabes?

A

•Quantitative variables

–Variables’ summary

  • Means, Medians (Q2) – measures of central tendency, location
  • Standard deviation – measures of spread of the data at hand - purely descriptive
  • Q1-first and Q3 (IQR=Q3-Q1) - third interquartile further information on location and spread of the data
  • Min, Max, Range(Max-Min)

–Visualization: histogram, box plots

•Qualitative variables

–Variables’ summary: proportions

–Visualization: bars of frequencies or proportions, pie plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are different concepts associated with statistical inference?

A

–Hypothesis testing

–P-values

–Parameter Estimates

–Standard Errors

–95% Confidence Intervals

All intrinsically linked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the null hypothesis, p value and standard errors and 95% CIs

A

Null Hypothesis usually a statement declaring no difference

  • Between a collected sample and the population (no difference, similarity)
  • Between two or more samples (no difference, similarity)
  • Between a disease occurrence and a prior exposure (no association)

P-value

  • A probability which measures the strength of the evidence against the null
  • Or how far is the sample data distribution from what the researcher hypothesises
  • <0.05 evidence AGAINST the null (evidence of dissimilarity)
  • >=0.05 no evidence against the null (NOT FOR THE NULL)

Standard errors and 95% confidence intervals - inferential concepts

  • associated with precision and uncertainty of the estimates (such as means, proportions, risk, odds, risk ratios, odds ratios)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Difference between standard deviation and standard error of the mean

A

Standard deviation: indicates the spread of the data at hand

Standard error of the mean: indicates the precision of the sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the prevalence of a particular disease in a population?

A
  • Prevalence is a risk – measures an existing part/share/proportion of a condition relative to a referenced population at a time point or within a defined time period.
    • For a fraction a/b, a is the numerator, b is the denominator
    • Numerator IS always part of the denominator.
  • Always refer to it as the “prevalence/risk of a condition in a population”
  • Unless the context is crystal clear, using solely “prevalence/risk of a disease” is highly objectionable

Infectious diseases examples:

  • Prevalence/risk of HIV in UK population in 2017 (0.16% - gender differences)
  • Prevalence/risk of HIV in UK gay population in 2017 (5% - regional differences)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How would you calculate the prevalence/ risk of a particular disease in a population?

A

At its simplest, the researcher/investigator

  • Defines the population
  • Takes a random sample is that population
  • Counts the individuals with the disease (D) and the total (N)
  • Calculates the ratio: D/N
  • Also called prevalence (point prevalence of period prevalence)

How many should the researcher collect?

  • That depends on the desirable precision – i.e. how precise the researcher wants/needs to be
  • Generally the more the better
  • But not always straightforward - resources, recruiting, etc
20
Q

What are rates?

A

An absolute measure

Rate – measures the occurrence of an event in a given population during a given period (e.g. one year). It always includes a time dimension.

•E. G. annual birth rate, monthly growth rate, annual death rate (can be disease specific), case fatality rates

  • Numerator (part of the denominator)
  • Denominator
  • Time specification
  • Multiplier – expressed per 100 or per 1000 population
  • Time span specification: - epidemiological question and disease dynamics
    • Flu, Coronavirus – days/weeks
    • HIV, TB – years or months
21
Q

What are incidence rates

A
  • More often than not it is the occurrence of new cases during a specified period of time which is of interest
  • Prevalence measures existing cases of the disease – dimensionless proportion
  • Incidence measures new cases – person-time units to reflect different individual time at risk

•Measures the speed at which disease develops in the population

  • Example: number of individuals developing cancer per 100,000 population in one year.
  • In a dynamic population, individuals may have been at risk (or exposed) for different lengths of time, so instead of counting the total number of individuals in the population at the start of the study, the time each individual spends in the study before developing the outcome of interest needs to be calculated.
  • The denominator in an incidence rate is the sum of each individual’s time at risk (i.e. the length of time they were followed up in the study) and is commonly expressed as person years at risk.
22
Q

What is the calculation of the incidence rate?

A

number of new cases during the observation period

___________________________________________________

(total number of person x time at risk)

23
Q

What are some common rates in the general population?

A
24
Q

What is the odds of disease?

A

This is an absolute measure of disease

RISK/PREVALENCE=π=(diseased no)/(population no) (always between 0 and 1)

Always between 0 and 1, also expressed as a %

ODDS=π/(1-π) (always >0 but can exceed 1)

•Rare diseases (<5% in the population):

–The ODDS approximates the RISK

25
Q

What is the difference between pevalence and incidence

A
  • Prevalence – existing cases – a condition believed to be common (proportion)
  • Incidence – new cases – the frequency with which they occur (rate)
  • Time span depending on disease dynamics
26
Q

What is a ratio?

“Risk/Rate/Odds amongst exposed” /”Risk/Rate/Odds amongst non exposed”

A

Ratio measures relation of size between two similar concepts applied in different circumstances (groups in the population)

Quantifies the effect of an exposure

  • No time dimension
  • No measurable units
  • Numerator IS NOT part of the denominator

Similar quantities

Common measure

Absolute measures involve 2 numbers (e.g. diseased in the population)

Relative measures involve 4 numbers (e.g. diseased/exposed, diseased/non exposed)

27
Q

How would you decide between incidence and prevalence?

A
  • No general firm rule, no constraints
  • Both, e.g. risk and rate are possible versions or measurements and answer different questions and can coexist for disease assessment
  • Observational, comparative, surveillance or ecological studies
  • Slow dynamics diseases – both appropriate
  • –HIV, Tuberculosis
  • –Cancer, cardio-vascular disease

•Fast dynamics diseases such as flu – rather incidence/rates

28
Q

What is a transmission parameter?

A
  • Defined as the probability of a “successful contact”
  • Infectious disease data – always some degree of dependency, usually partially observed and usually difficult to summarize
  • Transmission routes
  • Airborne – Flu, TB
  • Bodily fluids - Ebola
  • Sexual contacts - HIV
  • Vector borne - Malaria
  • Associated concepts – population level:
  • Epidemic - widespread occurrence of an infectious disease in a community (flu, TB, HIV)
  • Outbreak – an epidemic in a geographically restricted area ) legionnaire disease, hospital outbreaks)
  • Pandemic – an epidemic of extraordinary proportions (1918 influenza, 2019 coronavirus)
  • Endemic – constant presence of an infection in the community (malaria in tropical areas)
29
Q

What is the basic reproduction number (R0)

A
  • The basic reproduction number R0, typically defined as the expected number of secondary infections that result from a single infected individual in an entirely susceptible (nonimmune) population.
  • The key property of R0 is its use as a threshold parameter –
  • R0 >1 a major epidemic can only occur unless interventions stop it.
  • R0 <1 the infection will die out in long term
  • Key information about the necessary threshold level of vaccination in the population to reach herd immunity: 1-1/R0
30
Q

What are disease specific and dynamics determinants?

A
  • Disease specific and dynamics determinants:
  • Incubation period
  • Latent period
  • Infectious period
  • Generation time - the time that has elapsed between one person being infected and that person infecting someone else
31
Q

What are two different types of research studies at the population level?

A

Observational

  • Descriptive
    • •Aggregated level
    • Individual level
  • Analytical
    • •Cross sectional
  • Case control studies
  • Cohort (longitudinal)
  • Ecological - Surveillance

Experimental

  • Randomised control trials

Two central challenges: bias and confounding

•Global epidemiology - what is the global disease burden?

  • aggregated (group based) measures
  • suggestive, exploratory rather than providing compelling evidence
32
Q

Planning research or studies

A
  • Research starts with exploratory/discovery phases all way to confirmatory
  • But ideally, research in public health start with pilot studies
  • A version of the main study that is run in miniature to test whether the components of the main study can work together.
  • It is focused on ensuring that the processes of the main study run smoothly.
  • –Recruitment (Retention rates, Refusal rates, Failure/success rates, (Non)compliance or adherence rates)
  • –Randomisation,
  • –Treatment
  • –Follow-up assessments

•Therefore, it resembles the main study in many respects, including an assessment of the primary outcome – BUT IT IS NOT A HYPOTHESIS TEST

33
Q

What are different types of errors?

A
  • P(Type 1 error)=α=the probability of incorrect rejection of a true null hypothesis.
  • Type 1 error = FALSE POSITIVE RESULT
  • P(Type 2 error)= β =the probability of the failure to reject a false null hypothesis
  • Type 2 error = FALSE NEGATIVE RESULT
  • Power = 1-β=the probability that the null hypothesis is rejected, if a specific alternative hypothesis is true
  • A priori set values: usually α=0.05, β=0.80 (0.90)
  • A sample size large enough is necessary as determined by the effect size (difference we wish to see) and the two a-priori set values above
  • Ideally, the sample size, type 1 error and power cannot be disentangled and should be discussed altogether
  • The graph on the left illustrates how sample size varies with the effect size for different values of the desirable power and holding alpha constant and conversely. Namely:
  • The sample size increases with
  • •smaller effect size (x-y axes)
  • •smaller alpha (different colours)
  • •larger power (different lines and symbols)
34
Q

α-level of significance vs. p-values

A

α-level of significance

–prior to the experiment and/or conducting the test - there are no data or analyses data just yet

–usually α=0.05 but is can also be 0.01, 0.001, etc.) as we accept that there is only a 5 or 1 in 100 chance that the variation that we are seeing is due to chance.

–probability of incorrectly rejecting the null hypothesis in favour of the alternative when the null is in fact true (Type 1 error)

–an accepted margin of error (usually 5%)- the researcher accepts a small probability that rejecting the null hypothesis might be a mistake.

•P-values

–Calculated from the data, post data analyses – also called observed significance level

–Measure the strength of the evidence against the null hypothesis

–Inferential concept (you need data)

•Data summary - concepts referring to the data at hand: means (central tendency), standard deviation (spread of the data)

•Inferential concepts – result of a statistical process which assumes a long run repetition of a study or experiment and include true mean/proportion, standard error (of the mean/proportion/OR but not limited to), p-values and 95% CIs

35
Q

What are sources of bias in disease epidemiology?

A

•Selection bias: Systematic differences between characteristics of people selected for a study and those who were not (Population misrepresentation)

•Measurement bias: observer bias, responder bias, instrument bias, recall bias

•Confounding – occurs when an estimate of an association between an exposure and the disease is mixed up with the real effect of another exposure on the same disease. E.g. alcohol may confound the association between lung cancer and smoking

•Infection diseases epidemiology

  • –Underreporting – (STI - behavioural aspects)
  • –Lack/delayed clinical signs
  • –Infection severity
  • –Secondary transmission
36
Q

What are the advantages and disadvantages of cross sectional studies to test for the prevalence?

A

Advantages

  • •Relatively quick, cheap and easy to run
  • •Data on all variables is only collected once
  • •Measure prevalence for all factors under investigation.
  • •Multiple outcomes and exposures can be studied.

Disadvantages

  • •Difficult to determine whether the exposure or outcome came
  • •Not suitable for studying rare diseases or diseases with a short duration.
  • •Only prevalence not incidence
  • •Susceptible to biases such as responder bias, recall bias, interviewer bias and social acceptability bias.
37
Q

When would a cross sectional study be used?

A

Example: studying the prevalence of asthma among 12- to 14-year-olds.

  • Data are collected on the whole study population at a single point in time or during a relevant period of time to examine the relationship between disease (or other health-related outcomes) and other potential risk factors (exposures)
  • Used to assess the burden of disease or health needs of a population and potentially informing the planning and allocation of health resources
  • Bias in cross-sectional surveys
  • –Sample representativeness
  • –Sufficiently large sample size
  • Potential bias – non response
38
Q

When would you use an odds ratio study?

A
  • The study group is defined by the outcome (e.g. presence/absence of a disease)
  • Controls selected from the same population – potentially matched
  • Special methodology of cases and control selection
  • Example: case-control study of smoking and pancreatic cancer among 100 cases and 400 controls, the results of which are shown below
39
Q

Advantages and disadvantages of the odds ratio study

A

Adbantages:

  • Correctly designed - cost-effective
  • Can be retrospective but not necessarily
  • Diseases with long latency periods (cancer)
  • Rare diseases (genetic diseases).

Disadvantages

  • Prone to bias; bias selection, recall and observer bias.
  • Limited to examining one outcome.
  • Poor choice for the study of rare exposures.
  • No temporal inference
    • Only relative measures OR – rarely absolute measures
40
Q

When would you use a cohort study?

A
  • Cohort studies evaluate a possible association between exposure and outcome by following a group of individuals who share a common characteristic over a period of time
  • Example: Birth cohorts – data collected from children born during one year and followed-up often life-long for their physical and emotional development
  • Modern settings collect information on genetics, environmental exposures, life style and social aspects of individuals
41
Q

Adv and dis of cohort studies

A

Advantages

  • Multiple outcomes
  • Multiple exposures
  • Temporal meaningfully associations
  • Rare exposures, for example among different occupations.
  • Identify modifiable risk factors
  • Measures incidence and prevalence

Disadvantages

  • •Costly and time consuming
  • •Prone to bias due to loss to follow-up
  • •Long time may alter participant behaviour
  • •Inefficient for rare disease outcomes
  • •Classification of individuals (exposure or outcome status) can be affected by changes in diagnostic procedures.
  • •Potentially missing information about exposure or confounders that were overlooked at the start of the study.
42
Q

What are methods to deal with bias and confounding

A

•Design

–Randomisation

  • •Only experimental studies
  • •Ensures that known and unknown confounders are similarly distributed across groups
  • •Not always possible
  • •Very powerful – causal inference

–Restriction if appropriate (inclusion/exclusion criteria)

  • •Young population
  • •Non-smokers
  • •Elderly

–Matching

  • •Suitable for case-control studies
  • •Expensive

•Statistical Analysis

  • –Stratification
  • –Statistical/Mathematical Modelling techniques
43
Q

What are ecological studies

A
  • Routinely collected health information
  • Highlighting patterns of disease and associated factors
  • Useful to conduct when individual-level data would either be difficult or impossible to collect, such as environmental effects the effects such air pollution or of legislation.
  • Exploratory nature, discovery phase: can be used to generate hypotheses of possible causes or determinants of disease
  • –Spatial clusters of disease
  • –Temporal trends

•Examples:

  • –The development of a rare benign liver cancer in a woman taking oral contraceptives.
  • –Case series are useful in identifying epidemics: the presence of AIDS in North America was identified by the report of a cluster of homosexual men in Los Angeles with a similar clinical syndrome
44
Q

Whar is the ecological fallacy

A

•The ecological fallacy is an error in the interpretation of the results of an ecological study, where conclusions are inappropriately inferred about individuals from the results of aggregate data. The fallacy assumes that individual members of a group all have the average characteristics of the group as whole

45
Q

Routune sources of ecological data

A
  • They are not data collected specifically to answer a question
  • All countries have a system of collecting ongoing data
  • Demographic characteristics
    • UK: Census, General Household Survey
  • Health status
    • UK: Morbidity statistics from General practice, Communicable disease surveillance
    • Mortality data
  • Health services utilization
    • UK: immunization/vaccination level, screening (cervical/colon/breast cancer)
    • Out patient primary care data
    • Hospital-based/specialist care data: Admissions, discharges, length of stay
  • Validity and completeness – hugely variable across the countries; logistics and historical collection
  • The most valuable feature of routine data – availability
46
Q

What is heterogeneity and why is it improtant?

A
  • Heterogeneity = a mixture of diverse groups leading to differences in estimates across population groups
  • Heterogeneity sources in global epidemiology
  • Population density, population composition
  • Geographical aspects – latitude, longitude, altitude.
  • Seasonal significances
  • Economical characteristics
  • Cultural features – behavioural differences

•Heterogeneities within a population:

  • Age groups – e.g. flu transmission different between kids vs adults
  • Gender – e.g. HIV transmission

•Understanding sources of heterogeneity – implementing targeted efficient control measures

47
Q

Summary

A

•Learn about the your background setting and population - research

•Geography, economy, demography, economics, politics, public health issues

•Define a clear cut question

  • Define the population you address the question to
  • Plan the collection of the needed data

•Start with feasibility or pilot study – you learn about time, logistics, etc.

•Long way to changing clinical practice or to change policies at population level