Research tools Flashcards

Question 1

Q

Study design

Answer

A

Plan study - decide on field and literature review
Design study - variables, hypothesis, type of study, appropriate population, experimental methodology, sample size
Ethical issues + approval
Sample and data collect
Use statistics to draw conclusions
Interpret study findings
Present study

Question 2

Q

Interventional studies

Answer

A

Observed after subjected to intervention

Randomized controlled trial = best for intervention
- double-blinded (neither participants nor researcher aware)
- single-blinded (only researcher aware)
- non-blinded (both aware)

Non-randomized aka quasi-experiments, if randomization impossible or unethical
- interrupted time series analysis (observations before and after an interaction)
- case(intervention)-control (no intervention)

Question 3

Q

Observational studies

Answer

A

Ecological
- populations not individuals
- susceptible to confounding factors (ecological bias)

Cohort
- group sharing common characteristics
- outcome (eg disease free) is followed over time, prospective or retrospective
- can look at rare exposures or multiple outcomes, best for prognosis
- but expensive, time consuming, high drop-out rates
- prospective are highest quality observational type

Cross-sectional
- characteristics at single time point studied, data for whole population
- for prevalence, absolute and relative risks, but not incidence, best for diagnostic tests

Case-control
- if studying association of disease with past exposure
- odds ratios and absolute but not relative risk
- useful for rare diseases
- can be quick and inexpensive, but subject to recall bias

Question 4

Q

Strength of evidence (low to high)

Answer

A

In vitro studies
Animal models
Expert reports and opinions
Non-analytical studies
Case-control studies, quasi-experiments
Cohort studies (prospective better than retrospective)
Randomized trials (double-blinded best)
Meta-analyses and systematic reviews

Question 5

Q

Statistical bias

Answer

A

Where there is systematic distortion of collected data

selection / sampling bias
systematic bias
recall bias
bias of an estimator
measurement bias

Question 6

Q

Descriptive vs inferential statistics

Answer

A

Descriptive - use observations made in sample to describe a population (eg sample mean estimates population mean)

Inferential - study patterns between variables in sample to generalise to population (eg hypothesis testing, correlations)

Question 7

Q

Sampling

Answer

A

Random
Stratified/cluster - randomly select individuals from within specified groups
Multi-stage
Multi-phase

Question 8

Q

Variables

Answer

A

Categorical - data can be only one of a finite number of categories and values
- nominal - no ranking ie male/female
- ordinal - ranking within categories but differences are not relevant to scale ie Apgar scores

Quantitative - data are numerical (continuous ie BP or discrete ie no of pregnancies)
- interval - equally spaced intervals
- ratio - same, but vale 0 is absent variable

Univariate = analysis of 1 variable, multivariate is analysis of multiple association between variables

Question 9

Q

Measures of location within a distribution, or spread of distribution

Answer

A

Mean - sensitive to outliers, but cannot be calculated for nominal or ordinal variables
Median - robust to outliers, not for nominal
Mode - can be used for all types of variable

Standard deviation - measures of average distance that individual values are from the sample mean
Coefficient of variation - ratio of standard deviation to the mean, no units
Range - difference between highest and lowest value
Interquartile range - range between 1st and 3rd quartile

Question 10

Q

Normal distribution

Answer

A

Bell shaped
Mean = median = mode
68% values lie within 1 SD of mean
95% values lie within 1.96 SDs of mean (normal range)
99% values lie within 2.57 SDs of mean
Non-normally distributed samples can be converted with logarithmic transformation

Standard error = estimate of how far away from the true population mean a sample mean is (aka the SD of the sample mean with respect to the population mean, = SD/ √sample size)
- depends on variability in sample, and sample size

Confidence interval is the area likely to include the true value of the parameter
- eg 95% CI means there is 95% chance that the interval contains the true value
- level of CI indicates accuracy
- width of CI indicates precision

Question 11

Q

Statistical hypothesis testing

Answer

A

Null vs alternative hypothesis
Hypothesis can never be accepted, just rejected
Significance level is evidence required to reject null hypothesis, and conclude that event has NOT arisen by chance

P-value = probability of obtaining a false-positive result
- <0.05 is commonly accepted as significant, aka the observed result would have arisen by chance 1/20 times the study was performed

Question 12

Q

Error

Answer

A

Type 1 (α, = false positive)
When null hypothesis is wrongly rejected, ie falsely detecting a difference
- related to the significance level + the p-value

Type 2 (β, = false negative)
When wrongly fail to reject the null hypothesis, ie failing to detect a true difference
- related to the power of a study

Question 13

Q

Power of a study

Answer

A

= sensitivity
The probability that the test applied will correctly reject the null hypothesis
Higher power = lower probability of Type 2 error
Power = 1 - β

To calculate, first need to know the desired clinical difference to be detected, and the variability of the measured parameter

Power calculations are used to reflect minimum sample size needed to reject null hypothesis at particular significant level, or predict minimum detectable difference of studied effect likely to be observed at a particular sample size
- larger effect or more frequent outcome means fewer numbers needed in a sample to prove a significant difference
- power usually set to 80-90%, with significance set at 1-5%

Question 14

Q

Statistical hypothesis tests

Answer

A

Parametric - assumptions made about characteristics of probability distribution of variables, eg normally distributed
- higher statistical power, lower chance of Type 2 error

Non-parametric - no assumptions made
- more robust, lower chance of Type 1 error

Question 15

Q

Parametric statistical hypothesis test in normally distributed sample

Answer

A

t-test
- can be independent (2 unpaired distributions) or paired (effectively each pair acts as both case and control)
- can be from one or two samples

ANOVA
= analysis of variance
- very similar to t-test but for multiple distribution comparisons
- assumes that variance (amount of spread) in each distribution is the same
- because some comparisons might be significant by chance, apply Bonferroni correction (tests each individual comparison separately at smaller significance level)

Question 16

Q

Non-parametric statistical hypothesis test in non-normally distributed sample

Answer

A

Wilcoxon’s signed rank test (if paired)
Mann-Whitney U (if un-paired)

Question 17

Q

Chi-squared / χ^2 test

Answer

A

Tests if association between categorical variables
It is a measure of difference between expected count in each study group (as predicted by null) to experimentally observed ones
Need each group to have enough data
- smallest group needs at least 1 count
- >80% of groups need at least 5 counts

Can only show association between categorical variables but cannot quantify strength or direction of association - can do this with risk estimation or odds calculations

Question 18

Q

Risk

Answer

A

Chance of the event being investigated occurring
Risk in given group = the number of events occurring in the group / the total population of the group

Absolute risk difference is the difference in risk between two groups
AR = risk observed group - risk control group

Relative risk is risk of 1 group relative to another
RR = risk observed group / risk control group
- if RR 1, no difference between groups
- if RR>1 then greater risk in observed group than control, if <1 lower risk in observe
- if confidence interval for RR crosses 1, them risk difference is not significant

Question 19

Q

Odds

Answer

A

= probability of an event occurring vs the probability of the event not occurring
Odds in given group = the number of subjects with event occurring / the number of subjects without the event in the group

Odds ratio = odds of event in one group / odds of event in another group
- so quantifies how much more likely it is for an event to occur in one group vs the other
- advantages - can compare/combine ORs from different studies, amenable to logistic regression, can be calculated in case-control studies

Question 20

Q

Odds and risk calculations summary

Answer

A

Number of subjects with event occurring in observed group = a
Number of subjects without event occurring in observed group = b
Risk for event in observed group = a / (a+b)
Odds for event = a / b

Number of subjects with event occurring in control group = c
Number of subjects without event occurring in control group = d
Risk for event occurring in control group = c / (c+d)
Odds for event = c / d

Absolute risk = (a/(a+b)) - (c/(c+d))
Relative risk = (a/(a+b)) / (c/(c+d))
Odds ratio = (a/b) / (c/d)

Question 21

Q

Number needed to treat

Answer

A

Measure of effectiveness of intervention
Lower NNT = more effective intervention
NNT = 1 / Absolute risk

Question 22

Q

When comparing categorical variables…

Answer

A

Statistical hypothesis test (eg χ^2) shows if associaton
Odds ratio quantifies association
Confidence interval for OR indicates how precise

Question 23

Q

Correlation

Answer

A

Independent (fixed, eg treat or not treat) vs dependent (outcome) variables
Covariance if variables change together

Correlation is degree of association between variables
- Pearson’s correlation coefficient r most commonly used, just for linear associations
- r = 0 implies no association, r = +1 implies perfect positive linear association, -1 implies perfect negative
- calculated using SD so susceptible to outliers

Confounding variables correlate with both dependent and independent variables, lead to type 1 error

Question 24

Q

Regression

Answer

A

Set of methods to establish relationship between variables
- to explain acquired data or predict other values

Linear regression if relationship between independent and dependent variable is linear
- parametric
- so variables need to be quantitative with normal distribution (but can undergo transformation to get to this)

Regression equation describes average relationship between variables (slope of line, regression coefficient) but does not give info re the closeness of association (how close points are to line). Residual is measure of how good the fit of line is to point. Outliers at edges affect slope more than at the middle (leverage).

Question 25

Q

Diagnostic test performance

Answer

A

True positive rate = sensitivity = power = 1 - β
- best for screening

True negative rate = specificity = 1 - α
- best for diagnostics

False positive rate = type 1 error = α = 1 - specificity

False negative rate = type 2 error = β = 1 - sensitivity

Question 26

Q

Likelihood ratio

Answer

A

Measure of effectiveness of test
Usually LR of positive result, true positive rate / false positive rate
LR = sensitivity / (1-specificity) = TPR/FPR

If LR > 1 then post-test probability is higher than pretest probability
If <1 then lower

Can also do LR of negative result

Question 27

Q

Positive and negative predictive values

Answer

A

PPV is probability that a subject with a positive test, has the condition
= TP / (TP + FP)

NPV is probability that a subject with negative test does not have the condition
= TN / (TN + FN)

Accuracy is a measure of how many correct results the test gives in relation to all tests performed
= (TP + TN) / all tests performed

Question 28

Q

Receiver operator characteristic curve

Answer

A

Because specificity and sensitivity are inversely related, which one is of more importance depends on consequences of missed diagnosis vs misdiagnosis
- ROC helps choose cut-off value for test
- graphical plot of either sensitivity vs (1-specificity) or TPR vs FPR
- top left (most area under curve) is most accurate

Question 29

Q

Incidence and prevalence

Answer

A

Incidence = risk of event occurring per unit time

Prevalence = total number of events in population (at specific time) / population size
- how common event is

Tests with same sensitivity and specificity may have different predictive values depending on background odds (prevalence)

Precision = reproducibility
- measure of test’s ability to produce same result every time it is repeated on same subject
- intra-observer variability
= measure of how precise if same operator repeats it
- inter-observer if different operators

Question 30

Q

Screening programmes

Answer

A

Universal or targeted

condition should be important health problem
natural course of condition must be known
latent or early symptomatic stage must exist
should be treatment, which should be more effective early
facilities for diagnosis and treatment must be available
test must be acceptable to screened population
agreement on who to treat
should be cost-effective
should be continuously finding cases

Question 31

Q

Maternal mortality rate

Answer

A

= number of women who die while pregnant or during the first 42 days post pregnancy per 100,000 women of reproductive age in a given year for any cause related to or aggravated by pregnancy, but not from accidental or incidental cause

Direct maternal deaths
Indirect maternal deaths - from previous existing disease or disease that developed during pregnancy which was aggravated by physiologic effects of pregnancy
Coincidental maternal deaths - from unrelated causes which happen to occur in pregnancy or the puerperium
Late Maternal Death - from direct or indirect obstetric causes, more than 42 days, but less than 1 year after termination of pregnancy

Question 32

Q

Perinatal mortality rate

Answer

A

number of stillbirths and deaths in the first week of life per 1000 births

Question 33

Q

Maternal mortality ratio

Answer

A

maternal deaths per 100,000 live births