Statistics Flashcards
Numerical data types
Discrete - whole numbers only
Continuous - any value, scale - weight, height, length
What % of population for 1SD?
68%
What % of population for 2 SD?
95%
What % of population for 3 SD?
99.7%
Parametric data
- criteria?
- Advantages
Continuous numerical data
Population has normal distribution
Population and sample have same variance and SD
Parametric assessment has better power
Features of non-parametric tests
Emphasis on rank
Doesn’t require specific distribution
Less power
Central Limit theorem
For a skewed population, if N > 30, then can assume distribution will be normal
2 groups
Unpaired
Parametric
What tests?
Equal variance - student t test
Not equal variance - Welch test
2 groups
Unpaired
Non-parametric
Mann Whitney U test
2 groups
Paired
Parametric
Paired t test
2 groups
Paired
Non-parametric
Wilcoxon signed rank test
3 or more groups
Unpaired
Parametric
One way ANOVA
3 or more groups
Unpaired
Non-parametric
Kruksal Wallis test
3 or more groups
Paired
Parametric
One way repeated measured ANOVA
3 or more groups
Paired
Non-parametric
Friedman test
Test association between 2 qualitative variables
N > 50
Chi Squared test
Test association between 2 qualitative variables
N < 50
Fischer Exact Test
Test linear relationship between 2 variables
Parametric
Pearson’s correlation
Test linear relationship between 2 variables
Non-parametric
Spearman’s rank correlation
Difference between test statistic and p value
Test statistic - standardised value used for hypothesis testing
p value - probability that test statistic is random = type 1 error probability
When to use Z statistic
Known population mean + SD
Sample size > 30
Z = z score. Need population mean and sd
When to use t statistic
Popualtion mean and SD unknown
When to use F statistic
ANOVA
Statistical Power
- What is it
Probability study will detect predetermined difference between 2 groups
= probability will correctly accept alternative hypothesis
1- power = chance of false negative = probability of type 2 error
Changes that will increase power
Increase sample size
Increase significance level (0.05 - 0.1)
Increase detected difference
Reduce SD
Deciding significance level
If consequences of type 1 errror are serious - use small significance level, reducetype 1 error
If consequences of false negative are high, use higher significance level, increase power, reduce type 2 error chance
Drawback of post-hoc analysis
Type 1 error chance increases (selectively looking for positives, multiple error each time)
Drawback of trying to mitigate type 1 error risk in post-hoc analysis
Make total significance level smaller
Increase requirement for power - if N not increased, then type 2 error increases
Event rate is also called?
Absolute risk
NNT formula
1/ARR
i.e 1 divided by absolute risk reduction
Odds calculation
Number of events: number of non events
e.g
20% event rate
1:4 odds
Formula for probability from odds
Prob = odds/(odds+1)
Hazard vs hazard ration
Hazard - conditional probability of event given patient has survived to that point in time. On time to event analysisi
Hazard ratio - ratio of two groups hazards. Ratio should remain constant with time
Differences between hazard ratio and median ratio
Hazard ratio - odds of survival of one group to the other (odds on winning)
Median ration - margin of victory (compares point in time where probabilty of survival i 0.5 for both arms)
Tests used to compare Kaplan Meir
Log rank test - compares two survival curves
COX proportional regressional model - factors other variable to explain hazard
Study
Start with group
Trace backwards and determine exposure
Case Control
Study for rare disorders
Study for disease with long lag time between exposure and otucome
Case control
Bias affecting case control
Recall bias
Selection bias
Start with exposure
Measure if disease occured
Cohort
Measure newly occuring disease = prospective
Look back in time at exposure, see if disease developed = retrospective
Study for rare risk factor/exposure
Cohort study
Study where only individuals who have experienced an event are included
Self controlled case series
Sampling to use in homogenous population
Simple random sample
Sample to use in heterogenous population with homogenous subgroups
Stratified random sample
Sample to use for population that has heterogenous subgroups, which are similar to each other
Cluster sampling
e.g geography
Screening detects disease earlier, but disease course no different
Survival appears longer when it is not
Lead time bias
Screening detectes earlier cancers wich might not develop into end disease
Length time bias
Assume subjects remain in randomised group, regardless of crossover
Intention to treat analysis
Limitation of intention to treat analysis
Underestimates treatment effect
Analyse patients who strictly adhered to protocol
(exclude those who dropped out)
Per protocol analysis
Likely to show exaggerated treatment effect
Way to reduce confounders
Randomisation
Inaccurate way to select patients for trial
Produce sample that is not representative of population
Selection bias
Investigator knows which arm next patient will receive - can change who is allocated
Allocation bias
Blind to reduce
Data for study collected so taht some members of population less likely to be included (e.g email and old people)
Ascertainment bias = sampling bias
Difference in how information is obtained/recorded
Interviewer bias
Poor recollection of previous events
Recall bias
Lack of response from some patients changing sample characteristics
Non-response bias = transfer bias
Participants leave trial/lost to follow up
Attrition bias
Differences that occur due to knowledge of intervention allocation
Performance bias
Participants report positive effect if they know they are being observed
Hawthorne effect
Nocebo
negatiev expectations, cause control to have more negative effect
Rely too heavily on initial piece of information for all subsequent decisions
Anchoring bias
Not all research makes it into analysis
Publication bias
Establish dosing, pharmacokinetics
Phase 1
Establish significant A/E
Phase 2
Establish efficacy
Phase 3
stablish long term f/u and surveillance of a/e
Phase 4
Ways to assess heterogeneity in meta-analysis
How different the results of different studies are
1) Overlap of confindence intervals
2) I2 statistic - >7% = high heterogeneity
Random effects model
Takes into account other studies that may have been ignored by meta-analysis
Propensity score
- What is it
- What does it reduce
Score from 0-1, looks at other variables that predict whether a patients is assigned to a particular group (e.g smoking –> age, socioeconomic)
Aims to match groups
Reduces selection bias and confounding
When you use propensity score?
What does it reduce
No randomised studies
Reduces treatment selection bias due to knowledge of treatment
When can allocation bias occur?
When can allocation concealment be applied
Random allocation requires
- Generate random sequence - this not done can lead to allocation bias
Implementing random sequence so it is concealed - can use allocation concealment here
Example of allocation concealment
envelopes
Diferrence in allocation vs performance bias in terms of timing
Allocation bias occurs before allocation
- randomisation
- concealment
Performance occurs after randomisation
- recording of results due to knowledge of treatment
Coefficienct of variation
Standard deviation:mean
I2 is a measure of difference in what between studies?
Variance
Formula for variance
each value - mean, squared
Average of above values