Research tools Flashcards

1
Q

Study design

A

Plan study - decide on field and literature review
Design study - variables, hypothesis, type of study, appropriate population, experimental methodology, sample size
Ethical issues + approval
Sample and data collect
Use statistics to draw conclusions
Interpret study findings
Present study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Interventional studies

A

Observed after subjected to intervention

Randomized controlled trial = best for intervention
- double-blinded (neither participants nor researcher aware)
- single-blinded (only researcher aware)
- non-blinded (both aware)

Non-randomized aka quasi-experiments, if randomization impossible or unethical
- interrupted time series analysis (observations before and after an interaction)
- case(intervention)-control (no intervention)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Observational studies

A

Ecological
- populations not individuals
- susceptible to confounding factors (ecological bias)

Cohort
- group sharing common characteristics
- outcome (eg disease free) is followed over time, prospective or retrospective
- can look at rare exposures or multiple outcomes, best for prognosis
- but expensive, time consuming, high drop-out rates
- prospective are highest quality observational type

Cross-sectional
- characteristics at single time point studied, data for whole population
- for prevalence, absolute and relative risks, but not incidence, best for diagnostic tests

Case-control
- if studying association of disease with past exposure
- odds ratios and absolute but not relative risk
- useful for rare diseases
- can be quick and inexpensive, but subject to recall bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Strength of evidence (low to high)

A

In vitro studies
Animal models
Expert reports and opinions
Non-analytical studies
Case-control studies, quasi-experiments
Cohort studies (prospective better than retrospective)
Randomized trials (double-blinded best)
Meta-analyses and systematic reviews

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Statistical bias

A

Where there is systematic distortion of collected data

  • selection / sampling bias
  • systematic bias
  • recall bias
  • bias of an estimator
  • measurement bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Descriptive vs inferential statistics

A

Descriptive - use observations made in sample to describe a population (eg sample mean estimates population mean)

Inferential - study patterns between variables in sample to generalise to population (eg hypothesis testing, correlations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling

A

Random
Stratified/cluster - randomly select individuals from within specified groups
Multi-stage
Multi-phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variables

A

Categorical - data can be only one of a finite number of categories and values
- nominal - no ranking ie male/female
- ordinal - ranking within categories but differences are not relevant to scale ie Apgar scores

Quantitative - data are numerical (continuous ie BP or discrete ie no of pregnancies)
- interval - equally spaced intervals
- ratio - same, but vale 0 is absent variable

Univariate = analysis of 1 variable, multivariate is analysis of multiple association between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Measures of location within a distribution, or spread of distribution

A

Mean - sensitive to outliers, but cannot be calculated for nominal or ordinal variables
Median - robust to outliers, not for nominal
Mode - can be used for all types of variable

Standard deviation - measures of average distance that individual values are from the sample mean
Coefficient of variation - ratio of standard deviation to the mean, no units
Range - difference between highest and lowest value
Interquartile range - range between 1st and 3rd quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Normal distribution

A

Bell shaped
Mean = median = mode
68% values lie within 1 SD of mean
95% values lie within 1.96 SDs of mean (normal range)
99% values lie within 2.57 SDs of mean
Non-normally distributed samples can be converted with logarithmic transformation

Standard error = estimate of how far away from the true population mean a sample mean is (aka the SD of the sample mean with respect to the population mean, = SD/ √sample size)
- depends on variability in sample, and sample size

Confidence interval is the area likely to include the true value of the parameter
- eg 95% CI means there is 95% chance that the interval contains the true value
- level of CI indicates accuracy
- width of CI indicates precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Statistical hypothesis testing

A

Null vs alternative hypothesis
Hypothesis can never be accepted, just rejected
Significance level is evidence required to reject null hypothesis, and conclude that event has NOT arisen by chance

P-value = probability of obtaining a false-positive result
- <0.05 is commonly accepted as significant, aka the observed result would have arisen by chance 1/20 times the study was performed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Error

A

Type 1 (α, = false positive)
When null hypothesis is wrongly rejected, ie falsely detecting a difference
- related to the significance level + the p-value

Type 2 (β, = false negative)
When wrongly fail to reject the null hypothesis, ie failing to detect a true difference
- related to the power of a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Power of a study

A

= sensitivity
The probability that the test applied will correctly reject the null hypothesis
Higher power = lower probability of Type 2 error
Power = 1 - β

To calculate, first need to know the desired clinical difference to be detected, and the variability of the measured parameter

Power calculations are used to reflect minimum sample size needed to reject null hypothesis at particular significant level, or predict minimum detectable difference of studied effect likely to be observed at a particular sample size
- larger effect or more frequent outcome means fewer numbers needed in a sample to prove a significant difference
- power usually set to 80-90%, with significance set at 1-5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Statistical hypothesis tests

A

Parametric - assumptions made about characteristics of probability distribution of variables, eg normally distributed
- higher statistical power, lower chance of Type 2 error

Non-parametric - no assumptions made
- more robust, lower chance of Type 1 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Parametric statistical hypothesis test in normally distributed sample

A

t-test
- can be independent (2 unpaired distributions) or paired (effectively each pair acts as both case and control)
- can be from one or two samples

ANOVA
= analysis of variance
- very similar to t-test but for multiple distribution comparisons
- assumes that variance (amount of spread) in each distribution is the same
- because some comparisons might be significant by chance, apply Bonferroni correction (tests each individual comparison separately at smaller significance level)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Non-parametric statistical hypothesis test in non-normally distributed sample

A

Wilcoxon’s signed rank test (if paired)
Mann-Whitney U (if un-paired)

17
Q

Chi-squared / χ^2 test

A

Tests if association between categorical variables
It is a measure of difference between expected count in each study group (as predicted by null) to experimentally observed ones
Need each group to have enough data
- smallest group needs at least 1 count
- >80% of groups need at least 5 counts

Can only show association between categorical variables but cannot quantify strength or direction of association - can do this with risk estimation or odds calculations

18
Q

Risk

A

Chance of the event being investigated occurring
Risk in given group = the number of events occurring in the group / the total population of the group

Absolute risk difference is the difference in risk between two groups
AR = risk observed group - risk control group

Relative risk is risk of 1 group relative to another
RR = risk observed group / risk control group
- if RR 1, no difference between groups
- if RR>1 then greater risk in observed group than control, if <1 lower risk in observe
- if confidence interval for RR crosses 1, them risk difference is not significant

19
Q

Odds

A

= probability of an event occurring vs the probability of the event not occurring
Odds in given group = the number of subjects with event occurring / the number of subjects without the event in the group

Odds ratio = odds of event in one group / odds of event in another group
- so quantifies how much more likely it is for an event to occur in one group vs the other
- advantages - can compare/combine ORs from different studies, amenable to logistic regression, can be calculated in case-control studies

20
Q

Odds and risk calculations summary

A

Number of subjects with event occurring in observed group = a
Number of subjects without event occurring in observed group = b
Risk for event in observed group = a / (a+b)
Odds for event = a / b

Number of subjects with event occurring in control group = c
Number of subjects without event occurring in control group = d
Risk for event occurring in control group = c / (c+d)
Odds for event = c / d

Absolute risk = (a/(a+b)) - (c/(c+d))
Relative risk = (a/(a+b)) / (c/(c+d))
Odds ratio = (a/b) / (c/d)

21
Q

Number needed to treat

A

Measure of effectiveness of intervention
Lower NNT = more effective intervention
NNT = 1 / Absolute risk

22
Q

When comparing categorical variables…

A

Statistical hypothesis test (eg χ^2) shows if associaton
Odds ratio quantifies association
Confidence interval for OR indicates how precise

23
Q

Correlation

A

Independent (fixed, eg treat or not treat) vs dependent (outcome) variables
Covariance if variables change together

Correlation is degree of association between variables
- Pearson’s correlation coefficient r most commonly used, just for linear associations
- r = 0 implies no association, r = +1 implies perfect positive linear association, -1 implies perfect negative
- calculated using SD so susceptible to outliers

Confounding variables correlate with both dependent and independent variables, lead to type 1 error

24
Q

Regression

A

Set of methods to establish relationship between variables
- to explain acquired data or predict other values

Linear regression if relationship between independent and dependent variable is linear
- parametric
- so variables need to be quantitative with normal distribution (but can undergo transformation to get to this)

Regression equation describes average relationship between variables (slope of line, regression coefficient) but does not give info re the closeness of association (how close points are to line). Residual is measure of how good the fit of line is to point. Outliers at edges affect slope more than at the middle (leverage).

25
Q

Diagnostic test performance

A

True positive rate = sensitivity = power = 1 - β
- best for screening

True negative rate = specificity = 1 - α
- best for diagnostics

False positive rate = type 1 error = α = 1 - specificity

False negative rate = type 2 error = β = 1 - sensitivity

26
Q

Likelihood ratio

A

Measure of effectiveness of test
Usually LR of positive result, true positive rate / false positive rate
LR = sensitivity / (1-specificity) = TPR/FPR

If LR > 1 then post-test probability is higher than pretest probability
If <1 then lower

Can also do LR of negative result

27
Q

Positive and negative predictive values

A

PPV is probability that a subject with a positive test, has the condition
= TP / (TP + FP)

NPV is probability that a subject with negative test does not have the condition
= TN / (TN + FN)

Accuracy is a measure of how many correct results the test gives in relation to all tests performed
= (TP + TN) / all tests performed

28
Q

Receiver operator characteristic curve

A

Because specificity and sensitivity are inversely related, which one is of more importance depends on consequences of missed diagnosis vs misdiagnosis
- ROC helps choose cut-off value for test
- graphical plot of either sensitivity vs (1-specificity) or TPR vs FPR
- top left (most area under curve) is most accurate

29
Q

Incidence and prevalence

A

Incidence = risk of event occurring per unit time

Prevalence = total number of events in population (at specific time) / population size
- how common event is

Tests with same sensitivity and specificity may have different predictive values depending on background odds (prevalence)

Precision = reproducibility
- measure of test’s ability to produce same result every time it is repeated on same subject
- intra-observer variability
= measure of how precise if same operator repeats it
- inter-observer if different operators

30
Q

Screening programmes

A

Universal or targeted

  • condition should be important health problem
  • natural course of condition must be known
  • latent or early symptomatic stage must exist
  • should be treatment, which should be more effective early
  • facilities for diagnosis and treatment must be available
  • test must be acceptable to screened population
  • agreement on who to treat
  • should be cost-effective
  • should be continuously finding cases
31
Q

Maternal mortality rate

A

= number of women who die while pregnant or during the first 42 days post pregnancy per 100,000 women of reproductive age in a given year for any cause related to or aggravated by pregnancy, but not from accidental or incidental cause

Direct maternal deaths
Indirect maternal deaths - from previous existing disease or disease that developed during pregnancy which was aggravated by physiologic effects of pregnancy
Coincidental maternal deaths - from unrelated causes which happen to occur in pregnancy or the puerperium
Late Maternal Death - from direct or indirect obstetric causes, more than 42 days, but less than 1 year after termination of pregnancy

32
Q

Perinatal mortality rate

A

number of stillbirths and deaths in the first week of life per 1000 births

33
Q

Maternal mortality ratio

A

maternal deaths per 100,000 live births