Practical research skills Flashcards
Order of research
Research question
Study design
Research methods
Data collection tools
Data analysis
Dissemination
PICOST
Population
Intervention or Exposure
Control
Outcome
Setting
Timing or study Type
SMART
Specific
Measurable
Achievable
Realistic
Timely
Types of observational studies
Cross sectional
Cohort
Case control
Cross sectional study definition and considerations
Snapshot at a single point in time, with exposure and disease assessed simulataneously
Faster and less expensive than cohort studies
Cannot determine time relation between exposure and outcome
Cohort study description and considerations
Population of exposed vs unexposed followed up prospectively over a period of time to look for development of the outcome
Good for rare exposures and can assess multiple outcomes
Challenging if rare outcome or long latency
Very resource intensive
Case control study description and considerations
Participants recruited based on the presence or absence of an outcome then assessed retrospectively
Can assess for multiple exposures
Relatively fast
Risk of recall bias and controls must be chosen caefully
Three principles of confounders
- Associated with the exposure
- Risk factor for the outcome
- Not on the causal pathway between the exposure and outcome
Four options for dealing with confounders
e.g. ice cream , time spent at beach, shark attack example
- Restricting - only recruit those who spend time at the beach
- Matching - match participants based on the amount of time they spend at the beach
- Statistical control - Incorporate time spent at the beach as a variable in the statistical analysis
- Randomisation
Examples of bias
- Selection bias - study population does not represent the whole population of interest, comparison groups are not comparable
- Information bias - any error in the measurement of exposure or the outcome means there are systematic differences in the accuracy of information collected
e.g. recall bias
Clinical trial - four examples of “interventions”
Preventative strategy
Treatment
Screening tool
Diagnostic test
Phase 1 of a clinical trial
SAFE
Pharmacokinetics/pharmacodynamics in healthy volunteers
Small sample
Phase 2 of a clinical trial
EFFICACY
How well does the treatment work
Bigger sample than phase 1
May recruit participants who have not responded to standard treatments
Phase 3 clinical trial
IMPACT
Is the new treatment better than existing treatments?
Including
QOL
Reduction in the risk of recurrence
Side effect profile
Cost
Acceptability/feasibility
Phase 4 clinical trial
Is there a better way of implementing this treatment?
e.g. dose reduction, optimal length of treatment…
Cluster randomised study design
Larger groups (clusters) are randomised rather than individuals
Used when an intervention may affect a whole group leading to “contamination”
e.g. Asssessing the rate of smear test uptake after displaying posters about cervical cancer in a GP surgery
Blind study design
Do not know which intervention the participants receive
Single - only participants unaware
Double - both participants and investigators unaware
Crossover study design
Participants receive multiple treatments in a specific sequence, with each participant serving as their own control
May be useful if one particular treatment is known to be effective and you want everyone to have the opportunity to have it
Factorial study design
Evaluates multiple interventions simultaneously by using different combinations of treatments. This allows the study to assess both individual and interaction effects of the treatments
Adaptive study design
Allows modifications to the trial and/or
statistical procedures of the trial after its initiation without
undermining its validity and integrity to make clinical trials more
flexible, efficient and fast. Eg MAMS (multi-arm, multi-stage)
Superiority trial
Is the new treatment is better than the existing one
Equivalence trial
Equivalence - there is no difference between the new treatment and the existing one (clinical equivalence vs bioequivalence)
Non-inferiority trial
The new drug is not worse than the existing treatment by more
than a specified margin [the non-inferiority margin (δ)]
Any difference between two groups can occur as a result of….
Bias - reduced by randomisation
Confounding - addressed by randomisation
Chance - addressed by sample size calculations
A true difference in outcome
DSMB
Data safety monitoring board
Ensure that the protocol of the trial is followed appropriately
In what situations is it not appropriate to use an RCT?
- The answer is obvious
- Unethical
- There are questions about aspects of healthcare that should not be influenced by the investigators
- Long intervention between intervention and outcome
- Not financially feasible
Examples of poorly chosen outcomes
- Surrogate: An indirect measure of a clinical outcome which may not reflect the complexity of the disease process e.g. blood pressure as a surrogate for cardiovascular events
- Composite: A combination of multiple individual outcomes in one measure
- Subjective e.g. patient reported pain/QoL
- Complex: difficult to interpret
- Irrelevant (to patients and decision makers)
Why might a clinical trial fail to translate into improved outcomes
- Missing data
- Poorly specified outcomes
- Poorly interpreted
e.g. relative measures like reduction in risk rather than number needed to treat. Spinning the results to sound more positive, multiplicity by conducting multiple statistical tests without proper correction - Outcomes are selectively reported
e.g. publication bias, reporting bias, under-reporting of adverse events
Definition of target population, accessible population and intended study sample
Target population - people for whom results are intended to be generalised
Accessible population - geographically, demographically and temporally defined
Intended study sample - actual sample
Four examples of probability sampling
- Random sample - highly representative but resource intensive
- Stratified sample - Random subset selected from predefined groups e.g. ethnic group - highly representative across all strata but labour intensive
- Predefined periodic process e.g. every 4th person on a list - not as good as randomisation, may have bias if there is an order to the list
- Cluster sample e.g. from a school - easier than individual, needs a larger sample size
Types of sampling bias
Selection bias
Survivorship bias - ignores those who drop out due to adverse effects which may overestimate efficacy
Volunteer bias
Undercoverage bias - misses a portion of the population e.g. phone survey excludes people without landlines
Non-response bias
Healthy worker effect
Convenience sampling - not random therefore lacks generalisability
Principles of sample size
Tells you about the statistical (chance) error
Bigger sample sizes give us better precision (i.e. a narrower confidence interval)
What is the standard error related to?
The spreadness of a continuous variable or how evenly split a proportion is
What is type 1 error?
When you falsely reject the null hypothesis when it is actually true
i.e. you find an association when there isn’t one, often due to chance where you’ve sampled at the extremes of the population
What is type 2 error?
You do not reject the null hypothesis when it is false
i.e. you fail to prove an association between the variables when there actually is a true difference (underpowered)
What information do we need in order to decide a sample size?
- The size of change/risk that we wish to demonstrate with the study (e.g. prevalence <5%)
- The significance level (i.e. p value)
- The power of the study
How can we separate the causal effect we are interested in from other causal effects which it is mixed up with?
Stratification: divide the study sample into strata (subgroups) within each of which individuals are similar with respect to the potential confounder.
Any association observed within the strata cannot be due to the stratifying variable
What is the difference between confounders and effect modification?
Confounders are a nuisance and may muddy the waters when trying to find an association. No statistical test but expected to compare crude and adjusted effect measure and decide whether it’s important or not
Effect modification is useful to us and tells us how different factors may combine. There is a statistical test. If there is an important effect modification, presenting only a summary (average) estimate may not be very helpful: we recommend presenting stratum-specific estimates as well
Define standard deviation
The standard deviation is a measure of spread of the results in the given population sample by estimating the average difference in the measure from the mean. It is a characteristic of the population itself. Therefore increasing the sample size does not affect the degree of variability in the population.
FINER
Feasible
Interesting
Novel
Ethical
Relevant
Selection bias
The study population is not representative of the population of interest
Information bias
There is an issue with the measurement of the exposure and outcome between the two groups
Principles of inclusion and exclusion criteria
Inclusion:
Be specific
Demographic
Lab findings
Geographic
Temporal
Exclusion:
Be frugal
Ethical considerations
High likelihood of loss to follow up
Unable to provide good data
Haphazard sampling
No clear method
e.g. convenience sampling - chosen based on ease of access and availability
Volunteer sampling
Self-selected
Snowballing - existing participants recruit others - helpful for difficult to reach participants
Purposive sampling (qualitative)
Extreme case
Homogeneous - specific subgroup
Heterogeneous - diverse range
Critical case - particularly informative or crucial in the study
Typical case - representative
Theoretical - grounded theory - select cases based on emerging findings
Probability sampling
Simple random - everyone has an equal chance of being selected
Systematic random - regular intervals from an ordered list e.g. every 5th person
Stratified random - divided into subgroups with random samples taken from each
cluster random - population divided into clusters and whole cluster randomly selected
What makes an action morally justified?
Goal based approach - action is good if goal/outcome is good
Duty based approach - asks if an action accords with certain
principles
Rights based approach - stresses those individual freedoms and
claims protected in a given society by ‘rights’
Ethnography in qualitative research
Researchers immerse themselves into groups or organisations
Triangulation
Using multiple methods of data sources in qualitative research
aims to test validity by using different sources. can apply to the data source, investigator, theory and philosophical
Saturation and iteration in qualitative research
No longer bringing new insights
Flexible and iterative process, ready for unexpected results
Stages of field work
Proposal, approval, prepration
Enrollment, intervention, data collection, supervision, monitoring
Closing the site, analysis, write-up, dissemination
Positive confounder
Makes the association between exposure and outcome appear stronger (i.e. takes further away from the null hypothesis which is 1 in a ratio)
Negative confounder
Makes the association between exposure and outcome appear weaker (i.e. takes closer to the null hypothesis which is 1 in a ratio)
Where is the mean/median in a positive skew?
Think of the tail of the graph being at the positive end
The mean is therefore also at the positive end with the median to the left that, and the mode to the far left (the top of the hump)
what are all the bits of a box plot
Middle line: Median
Box: IQR (25% to 75% percentile)
whiskers: Lower IQR - (1.5 x Lower QR)
dots: outliers
Is range dependent on the sample size?
Yes, but inter-quartile range is not
1 standard deviation
68.3%
2 standard deviations
95.5%
3 standard deviations
99.7%
How to calculate the standard error
standard deviation / square root of n (number of observations)
What is the standard error
The standard deviation of the sample mean, affected by the sample size. Can be reduced with a larger sample size or reduce the standard deviation of the results (by improving methodology)
How does the confidence interval relate to the standard error?
95% confident that the true value lies within these values
The 95% CI is 1.96 standard errors either side of the mean estimate: 95% CI for the mean: sample mean ± (1.96 x SE)
T test
Used for analysing of normally distributed continuous variables
How do you calculate degrees of freedom?
n1 + n2 – 2
e.g. 10 boys and 10 girls would have df = 18
What does the power of a study depend on?
- the sample size
- the standard deviation
- the size of the difference you wish to detect
Paired T test
Used for normally distributed continuous variables when you’re coparing a change in two measurement occasions
e.g. at the start and the end of the year
One sample T test
Comparing the sample mean to the mean of the general population
e.g is there evidence that based on the sample of 20 test scores that the literacy skills fall below the national average
When to use Wilcoxon rank sum test?
Compare equality of distributions across two groups if the sample size is small or if it’s skewed
Less powerful
Does not use the numbers from the raw data but ranks them instead (to account for the skew)
Assumptions of a correlation
- Linearity of the association of X with Y.
- Approximate normal distribution of X and Y
When to use Fisher’s exact test?
- Total no. observations <20
- Individual cell counts <5
How do RR and OR change when the prevalence increases?
OR becomes much greater than the RR
When is OR best used?
Case control studies
When are risk ratios best used?
Cohort studies
RCTs
How to calculate number needed to treat?
Calculate the “cure rate” in each group and subtract to find out the difference
then 1/cure rate difference = NNT
Population attributable fraction
How much does the exposure account for the outcome in this case?
Calculate the risk ratio in each group including the totals
Then do the total risk ratio - unexposed risk ratio and divide by the total risk ratio
Effect modification
The effoct of the exposure on the outcome occurs across different levels of a third variable (i.e. enhancing the existing association)
You may wish to then choose to present the effect separately by subgroups of the stratification variable
Logistic regression:
Used for binary outcomes (e.g., hypertension: yes/no).
Models the probability of an event occurring.
Estimates odds ratios (ORs) to measure association.
Can adjust for multiple variables.
Linear regression:
Used for continuous outcomes (e.g., blood pressure as a number).
Predicts mean outcomes instead of probabilities.
Estimates mean differences.