Stats and epi definitions Flashcards

1
Q

statistical heterogeneity

A

Statistical heterogeneity manifests itself in the observed intervention effects being more different from each other than one would expect due to random error (chance) alone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

clinical heterogenity

A

differences in the study population characteristics, type of intervention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

methodological heterogeneity

A

differences in study design - blinding, sources of bias, the way the outcomes are defined and measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

conditional probability

A

probability for A occurring given B has already occurred
P(A|B) = P(A and B) / P(B)
Bayes theorem is based on conditional proability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

bayes theorem

A

It answers the question: “Given some new information, how should I update what I already believe?”. e.g. in diagnostic tests, it brings in other information about a patient or the prevalence of disease to the probability of a diagnostic test being correct.

P(A|B) = posterior probability
P(B|A) = likelihood - probability of seeing B if A true
P(A) = prior probability
P(B) = total probability of B

A = having disease
B = positive test result

posterior probability = probability of having disease given the result of the test (B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Frequency polygon =

A

line added to a histogram to join the centre of each bar to show to shape of the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

probability multiplication rule

A

P(A and B) = P(A) * P(B) (for independent events)
Used when you want to know the probability of both events occurring simultaneously.
Dependent events:
When events are not independent, the multiplication rule becomes more complex, requiring conditional probability calculations: P(A and B) = P(A) * P(B|A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

probability addition rule

A

P(A or B) = P(A) + P(B) - P(A and B)
for mutually exclusive events = P(A) + P(B)
Used when you want to know the probability of either one of two events happening.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Poisson distribution

A

= a probability distribution tells how many times an event is likely to occur over a specified period. It is a count distribution,
the parameter of which is lambda (λ); the mean number of events in the specific interval. (discrete quantitative data – incident rates) e.g. number of radioactive emissions detected by a Geiger counter in 5 minutes.
mean = variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Binomial distribution

A

probability distribution for data with two outcomes - success or failure
summarizes the probability that a value will take one of two independent values under a given set of parameters or assumptions.

The underlying assumptions of binomial distribution are that there is only one outcome for each trial, each trial has the same probability of success, and each trial is mutually exclusive or independent of one another.

Defined by n (sample size) and π (true probability or proportion)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

normal distribution

A

probability distribution for continuous data
The normal distribution describes a symmetrical plot of data around its mean value, where the width of the curve is defined by the standard deviation
95% of values are within 1.96 SDs of the mean.
For all other distributions, these approximate towards the normal distribution as sample size increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Standard normal distribution

A

has a mean of 0 and SD of 1
used to convert another data set to the standard normal to get a z score - shows how many SDs the result is from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

central limit theorem

A

sampling distributions (of any statistic) approximate towards the normal distribution as sample size increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

p value

A

probability of getting that result if the null hypothesis were true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Sample size calculations - what are they for and what do you need

A

Sample size calculations = ensure the study has sufficient number of participants to answer the study question i.e. detect an association if one truly exists. Depends on:
- the null and alternative hypotheses.
- The type of outcome variable (e.g. difference in mean, risk ratio)
- Effect size for clinically significant result (smaller needs larger sample)
- The variability in the outcome data – mean, SD, prevalence (from local data)
- Significance level
- Power
- Population proportion / prevalence of outcome (cohort studies) or exposure (case control) – smaller prevalence needs larger sample size
- Also consider dropout rates, design (clustered, multiple arms), ethics, budget

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

regression

A

a method which allows you to model the relationship between a dependent variable (target) and one or more independent variables (predictors).
It helps in predicting outcomes, identifying trends, and understanding the strength and nature of relationships between variables. It can be used to assess if there is an association between variables and to predict the value of one variable based on the value of another within the dataset

17
Q

Linear regression and assumptions

A

models relationship between a continuous dependent variable and one or more (multiple linear regression) independent variables using a linear equation
additive scale
assumptions :
Linear relationship between dependent and independent variables
the residuals (the differences between observed and predicted values) are normally distributed
No Multicollinearity: It is essential that the independent variables are not too highly correlated with each other, a condition known as multicollinearity.

18
Q

logistic regression and assumption

A

models the probability of a binary outcome (dependent variable - binary data) based on predictors (independent variables - can be any type)
log scale - output is log of odds
assumptions:
Independent observations (the observations should not come from repeated measurements or matched data).
no multicollinearity among the independent variables. Meaning, that the independent variables should not be too highly correlated with each other.
linearity of independent variables and log odds of the dependent variable.

19
Q

Poisson regression and assumptions

A

models outcome which is count data (rates)
output on a log scale - rate ratio
assumes:
- poisson distribution - Variance cannot be greater than the mean

20
Q

cox regression

A

models the relationship between outcome which is time to event data and dependent variables
output is on a log scale - hazards ratio
assumptions:
proportional hazards
censored data do not differ systematically / non informative censoring
independent observations

21
Q

Cluster analysis - why, approaches, pros and cons

A

why: feasability - some interventions are implemented at the group level (media campaigns, policy, group counselling or education)
Some interventions require structural change in the delivery of care such that it is
not possible to randomise individuals to receive different types of care.
reduces risk of contamination between groups

cons:
harder to interpret results - needs additional skill to design, implement and anlayse
requires larger sample size - more expensive
may be more complex to generalise

22
Q

analysis of clustered data

A

account for in regression models
still analyse all indiviudals but need to account for ICC
or can do aggregate analysis using clusters as experimental unit

23
Q

sample size for cluster trials

A

Calculate intra cluster correlation coefficient - quantifies the homogeneity within clusters and informs how much you need to inflate sample size
the number you need to increase by is the design effect
generally 30% larger sample size

24
Q

explanations for study findings

A
  1. True association
  2. Chance finding (eg small numbers, sampling error)
  3. Confounding. For example social deprivation associated with higher crime rates
  4. Bias. Information bias due to inconsistent recording of results. Selection bias
25
Q

stages of evaluation

A
  1. Plan the evaluation from before the intervention is implemented & conduct
    the evaluation as an integral part of implementation of the intervention
    scope of evaluation and evaluation questions - what type of evaluation
  2. Follow a theoretical model for evaluation (e.g. ‘logic model’ or Donabedian
    model)
  3. Define key outcome measures before starting – SMART or similar
    principles
  4. Include a ‘control’ group for comparison where possible
  5. Agree the data to be collected and methods for collection and analysis
    before starting the intervention
  6. Agree who will use the results of the evaluation
  7. Disseminate results to decision makers and other interested parties
  8. Follow all relevant ethical, governance and legal principles
26
Q

systematic review

A

A systematic review attempts to identify, appraise and synthesize all the
empirical evidence that meets pre-specified eligibility criteria to answer a given research question
repeatable and robust process
+
useful for decision making - relaible findings, reduces bias
studies for a meta analysis

-
time consuming
publication bias
older studies often not on databases
bias towards english

27
Q

heterogeneity

A

variation across studies - clinical and methodological can be assessed qualitatively, statistical is assessed using tests (cochrane’s Q (lacks power), I^2)

28
Q

qual research pros and cons

A

strengths
answers the how and the why
complement quant data
can Focus on specific groups or settings
Rich, detailed data
Power of story telling
Generate hypotheses to test

weaknesses
no evidence of causality or association
cannot generalise
reflexivity
depends on skills of researcher / interviewer
time intensive

29
Q

types of sampling methods from a population

A

Probability sampling:
simple random
systematic - every nth person is picked from a random point
stratified random sampling
cluster sampling

non probability sampling :
convenience sampling
quota sampling
purposive sampling (based on researcher knowledge)

30
Q

what is time series analysis and what are the pros and cons

A

research design in which measurements are made at several different times, thereby allowing trends to be detected e.g. ecological studies or descriptive studies of disease patterns.
requires baseline measurement and multiple points over time

can be done at population level (ecological - repeated cross sectional studies over time) or individual level with repeated measures

can be used for interrupted time series analyses or multiple groups analysis e.g. control

Need to be mindful of seasonal changes, autocorrelation, latency periods, secular trends, concurrent interventions/exposures and underlying changes in population structure (can control for confounding where data available e.g. seasonality)