Stats and epi definitions Flashcards
statistical heterogeneity
Statistical heterogeneity manifests itself in the observed intervention effects being more different from each other than one would expect due to random error (chance) alone.
clinical heterogenity
differences in the study population characteristics, type of intervention
methodological heterogeneity
differences in study design - blinding, sources of bias, the way the outcomes are defined and measured
conditional probability
probability for A occurring given B has already occurred
P(A|B) = P(A and B) / P(B)
Bayes theorem is based on conditional proability
bayes theorem
It answers the question: “Given some new information, how should I update what I already believe?”. e.g. in diagnostic tests, it brings in other information about a patient or the prevalence of disease to the probability of a diagnostic test being correct.
P(A|B) = posterior probability
P(B|A) = likelihood - probability of seeing B if A true
P(A) = prior probability
P(B) = total probability of B
A = having disease
B = positive test result
posterior probability = probability of having disease given the result of the test (B)
Frequency polygon =
line added to a histogram to join the centre of each bar to show to shape of the distribution
probability multiplication rule
P(A and B) = P(A) * P(B) (for independent events)
Used when you want to know the probability of both events occurring simultaneously.
Dependent events:
When events are not independent, the multiplication rule becomes more complex, requiring conditional probability calculations: P(A and B) = P(A) * P(B|A)
probability addition rule
P(A or B) = P(A) + P(B) - P(A and B)
for mutually exclusive events = P(A) + P(B)
Used when you want to know the probability of either one of two events happening.
Poisson distribution
= a probability distribution tells how many times an event is likely to occur over a specified period. It is a count distribution,
the parameter of which is lambda (λ); the mean number of events in the specific interval. (discrete quantitative data – incident rates) e.g. number of radioactive emissions detected by a Geiger counter in 5 minutes.
mean = variance
Binomial distribution
probability distribution for data with two outcomes - success or failure
summarizes the probability that a value will take one of two independent values under a given set of parameters or assumptions.
The underlying assumptions of binomial distribution are that there is only one outcome for each trial, each trial has the same probability of success, and each trial is mutually exclusive or independent of one another.
Defined by n (sample size) and π (true probability or proportion)
normal distribution
probability distribution for continuous data
The normal distribution describes a symmetrical plot of data around its mean value, where the width of the curve is defined by the standard deviation
95% of values are within 1.96 SDs of the mean.
For all other distributions, these approximate towards the normal distribution as sample size increases.
Standard normal distribution
has a mean of 0 and SD of 1
used to convert another data set to the standard normal to get a z score - shows how many SDs the result is from the mean
central limit theorem
sampling distributions (of any statistic) approximate towards the normal distribution as sample size increases.
p value
probability of getting that result if the null hypothesis were true
Sample size calculations - what are they for and what do you need
Sample size calculations = ensure the study has sufficient number of participants to answer the study question i.e. detect an association if one truly exists. Depends on:
- the null and alternative hypotheses.
- The type of outcome variable (e.g. difference in mean, risk ratio)
- Effect size for clinically significant result (smaller needs larger sample)
- The variability in the outcome data – mean, SD, prevalence (from local data)
- Significance level
- Power
- Population proportion / prevalence of outcome (cohort studies) or exposure (case control) – smaller prevalence needs larger sample size
- Also consider dropout rates, design (clustered, multiple arms), ethics, budget
regression
a method which allows you to model the relationship between a dependent variable (target) and one or more independent variables (predictors).
It helps in predicting outcomes, identifying trends, and understanding the strength and nature of relationships between variables. It can be used to assess if there is an association between variables and to predict the value of one variable based on the value of another within the dataset
Linear regression and assumptions
models relationship between a continuous dependent variable and one or more (multiple linear regression) independent variables using a linear equation
additive scale
assumptions :
Linear relationship between dependent and independent variables
the residuals (the differences between observed and predicted values) are normally distributed
No Multicollinearity: It is essential that the independent variables are not too highly correlated with each other, a condition known as multicollinearity.
logistic regression and assumption
models the probability of a binary outcome (dependent variable - binary data) based on predictors (independent variables - can be any type)
log scale - output is log of odds
assumptions:
Independent observations (the observations should not come from repeated measurements or matched data).
no multicollinearity among the independent variables. Meaning, that the independent variables should not be too highly correlated with each other.
linearity of independent variables and log odds of the dependent variable.
Poisson regression and assumptions
models outcome which is count data (rates)
output on a log scale - rate ratio
assumes:
- poisson distribution - Variance cannot be greater than the mean
cox regression
models the relationship between outcome which is time to event data and dependent variables
output is on a log scale - hazards ratio
assumptions:
proportional hazards
censored data do not differ systematically / non informative censoring
independent observations
Cluster analysis - why, approaches, pros and cons
why: feasability - some interventions are implemented at the group level (media campaigns, policy, group counselling or education)
Some interventions require structural change in the delivery of care such that it is
not possible to randomise individuals to receive different types of care.
reduces risk of contamination between groups
cons:
harder to interpret results - needs additional skill to design, implement and anlayse
requires larger sample size - more expensive
may be more complex to generalise
analysis of clustered data
account for in regression models
still analyse all indiviudals but need to account for ICC
or can do aggregate analysis using clusters as experimental unit
sample size for cluster trials
Calculate intra cluster correlation coefficient - quantifies the homogeneity within clusters and informs how much you need to inflate sample size
the number you need to increase by is the design effect
generally 30% larger sample size
explanations for study findings
- True association
- Chance finding (eg small numbers, sampling error)
- Confounding. For example social deprivation associated with higher crime rates
- Bias. Information bias due to inconsistent recording of results. Selection bias
stages of evaluation
- Plan the evaluation from before the intervention is implemented & conduct
the evaluation as an integral part of implementation of the intervention
scope of evaluation and evaluation questions - what type of evaluation - Follow a theoretical model for evaluation (e.g. ‘logic model’ or Donabedian
model) - Define key outcome measures before starting – SMART or similar
principles - Include a ‘control’ group for comparison where possible
- Agree the data to be collected and methods for collection and analysis
before starting the intervention - Agree who will use the results of the evaluation
- Disseminate results to decision makers and other interested parties
- Follow all relevant ethical, governance and legal principles
systematic review
A systematic review attempts to identify, appraise and synthesize all the
empirical evidence that meets pre-specified eligibility criteria to answer a given research question
repeatable and robust process
+
useful for decision making - relaible findings, reduces bias
studies for a meta analysis
-
time consuming
publication bias
older studies often not on databases
bias towards english
heterogeneity
variation across studies - clinical and methodological can be assessed qualitatively, statistical is assessed using tests (cochrane’s Q (lacks power), I^2)
qual research pros and cons
strengths
answers the how and the why
complement quant data
can Focus on specific groups or settings
Rich, detailed data
Power of story telling
Generate hypotheses to test
weaknesses
no evidence of causality or association
cannot generalise
reflexivity
depends on skills of researcher / interviewer
time intensive
types of sampling methods from a population
Probability sampling:
simple random
systematic - every nth person is picked from a random point
stratified random sampling
cluster sampling
non probability sampling :
convenience sampling
quota sampling
purposive sampling (based on researcher knowledge)
what is time series analysis and what are the pros and cons
research design in which measurements are made at several different times, thereby allowing trends to be detected e.g. ecological studies or descriptive studies of disease patterns.
requires baseline measurement and multiple points over time
can be done at population level (ecological - repeated cross sectional studies over time) or individual level with repeated measures
can be used for interrupted time series analyses or multiple groups analysis e.g. control
Need to be mindful of seasonal changes, autocorrelation, latency periods, secular trends, concurrent interventions/exposures and underlying changes in population structure (can control for confounding where data available e.g. seasonality)