Stats Flashcards
Null hypothesis
A strawman set to argue against the data
-If your results prove your hypothesis, you reject the null hypothesis
Type 1 error
Rejecting the null hypothesis when it is in fact true
A False positive result.
You’re wrong, but you don’t realise it.
P value
The probability of a Type 1 error occurring
i.e. the chance that you’re wrong, but you don’t realist it and the null hypothesis is actually true
Usually <0.05 arbitrarily set
Type 2 Error
Accepting the null hypothesis when it is in fact false
A false negative result
You decided you weren’t right when you were
Power
Likelihood of finding an effect when it is present
Power = 1-p(Type 2 Error)
So If most studies aim for a power of 80%, then it means that 80% of the time if the effect is there it will be noted.
Alternatively, the type 2 error rate would be 20%
Modifiers of Power
Bigger is better:
- Size of effect
- Sample size
Lower is preferred:
- Desired significance
- Standard deviation
Risk
The chance of something occuring
E.g. A population has a 5% chance of dying when they present with PE
Odds
The chance of something occurring compared with it not occurring
E.g A population has a 5% chance of dying compared to a 95% chance of surviving when they present with a PE.
Therefore: 5/95 = 1/19
For every 1 that dies, 19 survive
Relative Risk
The chance of something occurring relative to the chance of it occurring under different circumstances.
E.G. A population has a 5% chance of dying when they present with PE and a 15% chance of dying when they present with PE + hypotension
Therefore: 5%/15% -> 1/3
Therefore: A person presenting with a PE without hypotension has 1/3 the risk of death relative to one presenting with a PE with hypotension
Absolute Risk Reduction
The absolute difference in chance of something occurring compared to the chance of it occurring under different circumstances
E.G. A population has a 5% chance of dying when they present with PE and a 15% chance of dying when they present with PE + hypotension
Therefore: 15% - 5% -> 10%
Therefore: the absolute risk of death has increased by 10%
Number Needed to Treat
NNT = 1/ARR
E.G ARR = 10%
Taking 10 people and treating them prevents 1 death
Odds Ratio
The Odds version of relative risk
The odds of something occurring relative to it occurring under different circumstances
E.G. A population has a 5% chance of dying when they present with PE and a 15% chance of dying when they present with PE + hypotension
Odds death nohypo = (1/20)/(19/20) = 1/19
Odds death hypo = (3/20)/(17/20) = 3/17
Odd ratio death hypo relative to death nonhypo = (3/17)/(1/19)
Therefore: (3/17)x(19/1)
Therefore: (3x19)/17 -> 57/17 -> 3.35
The odds of death are 3.3x higher with hypo than without
Note: it’s not 3x higher like with the relative risk
Which study uses odds ratio instead of relative risk
Case control
Cross Sectional Study
Measure the prevalence of disease and exposure in a random sample of a population at a time point
Pros
- Cheap and easy
- Questionnaires
Cons
- Recall bias from self reporting
- Can’t determine which came first (Temporality)
- Non response bias (Those who participate may differ from those who don’t)
- Bad for rare issues as it randomly samples a population
- Confounding
Case Control Study
Sample disease states and then ask retrospectively about exposure
Pros
- Efficient for rare diseases and outbreaks
Cons
- Hard to find matched controls
- Can’t determine which came first
- Cause/effect impossible to ascertain
- Can’t be used for prevalence, incidence, and risk
- You select a control for every case, so you can’t compare one to the other like that
- Have to use regression to spit out an odds ratio
- Recall bias
- Confounding
Prospective Cohort study
Measure risk factors in people disease free at baseline
Follow them over time, wait for them to develop the outcome, and calculate risk/rates of developing disease
Pros
- Exposure occurs prior to outcome
- Able to study multiple outcomes
- Can be used for rare exposures and multiple outcomes
- Can be used for prevalence and incidence
- Usually generalisable due to sampling from general community
- Avoid recall bias
Cons
- Expensive
- Take a long time
- Confounding
- Loss to follow-up
Retrospective cohort study
Cohort assembled after an outcome has occurred using stored data
Pros
- Exposure occurs prior to outcome
- Cheaper and faster than prospective
Cons
- Data quality may be limited
Nested Case Control
Cases and controls drawn from a prospective cohort study
Cases who develop an outcome during follow-up are compared with matched controls
Controls taken from the cohort who didn’t develop the outcome
Pros
- Efficient for expensive measurements
- Gene assays, ELISA, etc
- Blood collected prior to disease
- Avoids reverse causality issues
Cons
- Similar to cohort
RCT
Gold standard
Two matched populations undergo different interventions with (ideally) pre-specified outcomes to be observed
Pros
- Randomisation minimises confounding
- Blinding minimises bias
Cons
- Expensive
- Non-compliance
- Loss to follow up reduces statistical power
- Only look at short term outcomes
- Not always ethically possible
- E.g randomising people to ETOH or cigarettes
- May not have generalisable results
- I.E may be internally but not externally valid
Internal validity = study was statistically robust and followed a well set up protocol
External validity = that the study design would actually be applicable to the population you want to intervene upon
Per Protocol
Only patients who followed protocol are included in the analysis
Loses randomisation
As Treated
Unlike per protocol, which removes the non-adherents, this approach analyses according to the treatment they received
Also loses randomisation
Likelihood Ratio
The probability that a result would be expected in a patient with the disorder compared to the probability that the same result would be expected in a patient without the disorder
Allows adjusting pre-test probability via a given test result to post-test probability
Positive LR
(Probability individual with disease has a positive test)/(Probability individual without disease has a positive test)
Note: The numerator is the same definition as sensitivity
Note: The denominator is the complement of the specificity (1-spec)
Negative LR
(Probability individual with disease has a negative test)/(Probability individual without disease has a negative test)
Note: The numerator is the complement of the sensitivity (1-sens)
Note: The denominator is the same definition as specificity
Kaplan Meier
Measures time to an event
chi2 analysis gives a p-value that determines statistical significance
Dropouts are signified with an open shape (circle/triangle to the right)
Hazard ratio describes the difference in hazard rates per unit of time (not instantaneously like in RR) over the study period
This can become an issue when hazard isn’t constant, like with a surgical intervention which has heavy early risk
If the curve has a clear crossover point think about there being distinct populations
Waterfall Plots
Each bar represents a patient with the y-axis representing response
Can be very helpful in visualising groups with differing underlying molecular mechanisms
Non inferiority
Non-inferiority when it’d be unethical to test against placebo
or when you want to win on convenience rather than superiority like in DOACs vs warfarin (no INRs, much fewer interactions)
Non-inferiority should be analysed by per protocol rather than ITT
Selection Bias
Method for selecting particiapnts produce sample that is not representative of the pupulation of interest
Implicaitions for generalisability
Allocation Bias
MAy result if the investigators know or predict whicj intervention the next eligable participant is supposed to receive
Channeling bias
When a patient’s prognosis or degree of illness influences which group he or she is put into in a study
Ascertainment bias
Members of a target population are less likely to be included in the final results than others
Information Bias
Systemic difference in the way that information is collected between 2 groups being compared
Interviewer Bias
Difference in the way that information is obtained or recoreded in the setting of the interviewer being aware of subject’s disease status e.g. retrospective case control when trying to find an exposure
Chronological Bias
Differences between those recruited earlier in the process than those recruited later
Recall bias
CAn’t remember experiences accurately
Transfer Bias/Non-response bias
When a sample that is representative is chosen but a subset cannot be contacted or does not respond and differs from responders
Attrition Bias
Participants leave a study
PErformance bias
Occurs doe to knowledge of itnerventions allocation in either the researcher or participant and can inflate the estimated effect of the intervention
Confirmation bias
Whenthe researcher looks for and uses the information to support their own ideas or beliefs
Anchoring bias
When researcher depends too heavily on an initial peice of info offered when making decisions
pre test odds
pre test odds = p/1-p
p=prevalence
Post test odds
post test odds = pretest oddsxLR
post test probability
post test probability = post test odds/1+post test odds
pre test probability
= prevalence