Research tools Flashcards
Study design
Plan study - decide on field and literature review
Design study - variables, hypothesis, type of study, appropriate population, experimental methodology, sample size
Ethical issues + approval
Sample and data collect
Use statistics to draw conclusions
Interpret study findings
Present study
Interventional studies
Observed after subjected to intervention
Randomized controlled trial = best for intervention
- double-blinded (neither participants nor researcher aware)
- single-blinded (only researcher aware)
- non-blinded (both aware)
Non-randomized aka quasi-experiments, if randomization impossible or unethical
- interrupted time series analysis (observations before and after an interaction)
- case(intervention)-control (no intervention)
Observational studies
Ecological
- populations not individuals
- susceptible to confounding factors (ecological bias)
Cohort
- group sharing common characteristics
- outcome (eg disease free) is followed over time, prospective or retrospective
- can look at rare exposures or multiple outcomes, best for prognosis
- but expensive, time consuming, high drop-out rates
- prospective are highest quality observational type
Cross-sectional
- characteristics at single time point studied, data for whole population
- for prevalence, absolute and relative risks, but not incidence, best for diagnostic tests
Case-control
- if studying association of disease with past exposure
- odds ratios and absolute but not relative risk
- useful for rare diseases
- can be quick and inexpensive, but subject to recall bias
Strength of evidence (low to high)
In vitro studies
Animal models
Expert reports and opinions
Non-analytical studies
Case-control studies, quasi-experiments
Cohort studies (prospective better than retrospective)
Randomized trials (double-blinded best)
Meta-analyses and systematic reviews
Statistical bias
Where there is systematic distortion of collected data
- selection / sampling bias
- systematic bias
- recall bias
- bias of an estimator
- measurement bias
Descriptive vs inferential statistics
Descriptive - use observations made in sample to describe a population (eg sample mean estimates population mean)
Inferential - study patterns between variables in sample to generalise to population (eg hypothesis testing, correlations)
Sampling
Random
Stratified/cluster - randomly select individuals from within specified groups
Multi-stage
Multi-phase
Variables
Categorical - data can be only one of a finite number of categories and values
- nominal - no ranking ie male/female
- ordinal - ranking within categories but differences are not relevant to scale ie Apgar scores
Quantitative - data are numerical (continuous ie BP or discrete ie no of pregnancies)
- interval - equally spaced intervals
- ratio - same, but vale 0 is absent variable
Univariate = analysis of 1 variable, multivariate is analysis of multiple association between variables
Measures of location within a distribution, or spread of distribution
Mean - sensitive to outliers, but cannot be calculated for nominal or ordinal variables
Median - robust to outliers, not for nominal
Mode - can be used for all types of variable
Standard deviation - measures of average distance that individual values are from the sample mean
Coefficient of variation - ratio of standard deviation to the mean, no units
Range - difference between highest and lowest value
Interquartile range - range between 1st and 3rd quartile
Normal distribution
Bell shaped
Mean = median = mode
68% values lie within 1 SD of mean
95% values lie within 1.96 SDs of mean (normal range)
99% values lie within 2.57 SDs of mean
Non-normally distributed samples can be converted with logarithmic transformation
Standard error = estimate of how far away from the true population mean a sample mean is (aka the SD of the sample mean with respect to the population mean, = SD/ √sample size)
- depends on variability in sample, and sample size
Confidence interval is the area likely to include the true value of the parameter
- eg 95% CI means there is 95% chance that the interval contains the true value
- level of CI indicates accuracy
- width of CI indicates precision
Statistical hypothesis testing
Null vs alternative hypothesis
Hypothesis can never be accepted, just rejected
Significance level is evidence required to reject null hypothesis, and conclude that event has NOT arisen by chance
P-value = probability of obtaining a false-positive result
- <0.05 is commonly accepted as significant, aka the observed result would have arisen by chance 1/20 times the study was performed
Error
Type 1 (α, = false positive)
When null hypothesis is wrongly rejected, ie falsely detecting a difference
- related to the significance level + the p-value
Type 2 (β, = false negative)
When wrongly fail to reject the null hypothesis, ie failing to detect a true difference
- related to the power of a study
Power of a study
= sensitivity
The probability that the test applied will correctly reject the null hypothesis
Higher power = lower probability of Type 2 error
Power = 1 - β
To calculate, first need to know the desired clinical difference to be detected, and the variability of the measured parameter
Power calculations are used to reflect minimum sample size needed to reject null hypothesis at particular significant level, or predict minimum detectable difference of studied effect likely to be observed at a particular sample size
- larger effect or more frequent outcome means fewer numbers needed in a sample to prove a significant difference
- power usually set to 80-90%, with significance set at 1-5%
Statistical hypothesis tests
Parametric - assumptions made about characteristics of probability distribution of variables, eg normally distributed
- higher statistical power, lower chance of Type 2 error
Non-parametric - no assumptions made
- more robust, lower chance of Type 1 error
Parametric statistical hypothesis test in normally distributed sample
t-test
- can be independent (2 unpaired distributions) or paired (effectively each pair acts as both case and control)
- can be from one or two samples
ANOVA
= analysis of variance
- very similar to t-test but for multiple distribution comparisons
- assumes that variance (amount of spread) in each distribution is the same
- because some comparisons might be significant by chance, apply Bonferroni correction (tests each individual comparison separately at smaller significance level)