Stats and epi definitions Flashcards by Catriona J H Crookes

statistical heterogeneity

Statistical heterogeneity manifests itself in the observed intervention effects being more different from each other than one would expect due to random error (chance) alone.

How well did you know this?

Not at all

Perfectly

clinical heterogenity

differences in the study population characteristics, type of intervention

How well did you know this?

Not at all

Perfectly

methodological heterogeneity

differences in study design - blinding, sources of bias, the way the outcomes are defined and measured

How well did you know this?

Not at all

Perfectly

conditional probability

probability for A occurring given B has already occurred
P(A|B) = P(A and B) / P(B)
Bayes theorem is based on conditional proability

How well did you know this?

Not at all

Perfectly

bayes theorem

It answers the question: “Given some new information, how should I update what I already believe?”. e.g. in diagnostic tests, it brings in other information about a patient or the prevalence of disease to the probability of a diagnostic test being correct.

P(A|B) = posterior probability
P(B|A) = likelihood - probability of seeing B if A true
P(A) = prior probability
P(B) = total probability of B

A = having disease
B = positive test result

posterior probability = probability of having disease given the result of the test (B)

How well did you know this?

Not at all

Perfectly

Frequency polygon =

line added to a histogram to join the centre of each bar to show to shape of the distribution

How well did you know this?

Not at all

Perfectly

probability multiplication rule

P(A and B) = P(A) * P(B) (for independent events)
Used when you want to know the probability of both events occurring simultaneously.
Dependent events:
When events are not independent, the multiplication rule becomes more complex, requiring conditional probability calculations: P(A and B) = P(A) * P(B|A)

How well did you know this?

Not at all

Perfectly

probability addition rule

P(A or B) = P(A) + P(B) - P(A and B)
for mutually exclusive events = P(A) + P(B)
Used when you want to know the probability of either one of two events happening.

How well did you know this?

Not at all

Perfectly

Poisson distribution

= a probability distribution tells how many times an event is likely to occur over a specified period. It is a count distribution,
the parameter of which is lambda (λ); the mean number of events in the specific interval. (discrete quantitative data – incident rates) e.g. number of radioactive emissions detected by a Geiger counter in 5 minutes.
mean = variance

How well did you know this?

Not at all

Perfectly

Binomial distribution

probability distribution for data with two outcomes - success or failure
summarizes the probability that a value will take one of two independent values under a given set of parameters or assumptions.

The underlying assumptions of binomial distribution are that there is only one outcome for each trial, each trial has the same probability of success, and each trial is mutually exclusive or independent of one another.

Defined by n (sample size) and π (true probability or proportion)

How well did you know this?

Not at all

Perfectly

normal distribution

probability distribution for continuous data
The normal distribution describes a symmetrical plot of data around its mean value, where the width of the curve is defined by the standard deviation
95% of values are within 1.96 SDs of the mean.
For all other distributions, these approximate towards the normal distribution as sample size increases.

How well did you know this?

Not at all

Perfectly

Standard normal distribution

has a mean of 0 and SD of 1
used to convert another data set to the standard normal to get a z score - shows how many SDs the result is from the mean

How well did you know this?

Not at all

Perfectly

central limit theorem

sampling distributions (of any statistic) approximate towards the normal distribution as sample size increases.

How well did you know this?

Not at all

Perfectly

p value

probability of getting that result if the null hypothesis were true

How well did you know this?

Not at all

Perfectly

Sample size calculations - what are they for and what do you need

Sample size calculations = ensure the study has sufficient number of participants to answer the study question i.e. detect an association if one truly exists. Depends on:
- the null and alternative hypotheses.
- The type of outcome variable (e.g. difference in mean, risk ratio)
- Effect size for clinically significant result (smaller needs larger sample)
- The variability in the outcome data – mean, SD, prevalence (from local data)
- Significance level
- Power
- Population proportion / prevalence of outcome (cohort studies) or exposure (case control) – smaller prevalence needs larger sample size
- Also consider dropout rates, design (clustered, multiple arms), ethics, budget

How well did you know this?

Not at all

Perfectly

regression

a method which allows you to model the relationship between a dependent variable (target) and one or more independent variables (predictors).
It helps in predicting outcomes, identifying trends, and understanding the strength and nature of relationships between variables. It can be used to assess if there is an association between variables and to predict the value of one variable based on the value of another within the dataset

Linear regression and assumptions

models relationship between a continuous dependent variable and one or more (multiple linear regression) independent variables using a linear equation
additive scale
assumptions :
Linear relationship between dependent and independent variables
the residuals (the differences between observed and predicted values) are normally distributed
No Multicollinearity: It is essential that the independent variables are not too highly correlated with each other, a condition known as multicollinearity.

logistic regression and assumption

models the probability of a binary outcome (dependent variable - binary data) based on predictors (independent variables - can be any type)
log scale - output is log of odds
assumptions:
Independent observations (the observations should not come from repeated measurements or matched data).
no multicollinearity among the independent variables. Meaning, that the independent variables should not be too highly correlated with each other.
linearity of independent variables and log odds of the dependent variable.

Poisson regression and assumptions

models outcome which is count data (rates)
output on a log scale - rate ratio
assumes:
- poisson distribution - Variance cannot be greater than the mean

cox regression

models the relationship between outcome which is time to event data and dependent variables
output is on a log scale - hazards ratio
assumptions:
proportional hazards
censored data do not differ systematically / non informative censoring
independent observations

Cluster analysis - why, approaches, pros and cons

why: feasability - some interventions are implemented at the group level (media campaigns, policy, group counselling or education)
Some interventions require structural change in the delivery of care such that it is
not possible to randomise individuals to receive different types of care.
reduces risk of contamination between groups

cons:
harder to interpret results - needs additional skill to design, implement and anlayse
requires larger sample size - more expensive
may be more complex to generalise

analysis of clustered data

account for in regression models
still analyse all indiviudals but need to account for ICC
or can do aggregate analysis using clusters as experimental unit

sample size for cluster trials

Calculate intra cluster correlation coefficient - quantifies the homogeneity within clusters and informs how much you need to inflate sample size
the number you need to increase by is the design effect
generally 30% larger sample size

explanations for study findings

True association
Chance finding (eg small numbers, sampling error)
Confounding. For example social deprivation associated with higher crime rates
Bias. Information bias due to inconsistent recording of results. Selection bias

stages of evaluation

1. Plan the evaluation from before the intervention is implemented & conduct the evaluation as an integral part of implementation of the intervention scope of evaluation and evaluation questions - what type of evaluation 2. Follow a theoretical model for evaluation (e.g. ‘logic model’ or Donabedian model) 3. Define key outcome measures before starting – SMART or similar principles 4. Include a ‘control’ group for comparison where possible 5. Agree the data to be collected and methods for collection and analysis before starting the intervention 6. Agree who will use the results of the evaluation 7. Disseminate results to decision makers and other interested parties 8. Follow all relevant ethical, governance and legal principles

systematic review

A systematic review attempts to identify, appraise and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a given research question repeatable and robust process + useful for decision making - relaible findings, reduces bias studies for a meta analysis - time consuming publication bias older studies often not on databases bias towards english

heterogeneity

variation across studies - clinical and methodological can be assessed qualitatively, statistical is assessed using tests (cochrane's Q (lacks power), I^2)

qual research pros and cons

strengths answers the how and the why complement quant data can Focus on specific groups or settings Rich, detailed data Power of story telling Generate hypotheses to test weaknesses no evidence of causality or association cannot generalise reflexivity depends on skills of researcher / interviewer time intensive

types of sampling methods from a population

Probability sampling: simple random systematic - every nth person is picked from a random point stratified random sampling cluster sampling non probability sampling : convenience sampling quota sampling purposive sampling (based on researcher knowledge)

what is time series analysis and what are the pros and cons

research design in which measurements are made at several different times, thereby allowing trends to be detected e.g. ecological studies or descriptive studies of disease patterns. requires baseline measurement and multiple points over time can be done at population level (ecological - repeated cross sectional studies over time) or individual level with repeated measures can be used for interrupted time series analyses or multiple groups analysis e.g. control Need to be mindful of seasonal changes, autocorrelation, latency periods, secular trends, concurrent interventions/exposures and underlying changes in population structure (can control for confounding where data available e.g. seasonality)