Type 1 and 2 error Flashcards
Statistical error
We can never be completely certain that we are right when we reject or fail to reject the null hypothesis
Type 1 error = rejecting the null hypothesis when it is CORRECT
Saying that means are different when they are the same FALSE POSITIVE
Type 2 error = failing to reject the null hypothesis when it is incorrect
Saying the means are the same when they are different FALSE NEGATIVE
Type 1 error = FALSE POSITIVE
problems
- Overcall positive results
- Identify a treatment effect when doesn’t exist
- Waste of time and effort on further development of an ineffective drug
- Consequences for patients
Type 2 error = FALSE NEGATIVE
problems
- Overcall negative results
- Fail to identify a treatment effect when one does exist
- Reject /lose a potentially effective treatment
- Waste resources used so far on drug development
Type 1 error rate
how more likely and how to reduce
Rejecting the null hypothesis when It is correct
- Designated by a (alpha) usually set to 0.05
- Implying that it is okay to have a 5% probability of incorrectly rejecting a true null hypothesis
Type 1 errors more likely with:
• Multiple tests – if we do 20 tests, one will falsely reject the null hypothesis
• Higher alpha values will cause error also
Reduced by:
• Pre-study analysis design – avoid multiple testing
• Setting a lower e.g., 0.01
• Reporting p values to 3 decimal places to give accurate probability estimate
Type 2 error rate
how more likely and how to reduce
FAILING to reject the null hypothesis when it is incorrect
Denoted by Greek letter B
‘POWER’ of a test is 1-B
• Power = likelihood of a statistical test detecting an effect when there is one
• Greater power = less likely to be a false negative result
Type 2 errors more likely with:
• Small samples
• Small effect size - hard to detect
Prevented by:
• Large sample size
• Larger effect size (choosing an outcome where you can measure better effect size)
Multiplicity
Performing many statistical tests on one clinical trial
Increases the risk of type 1 error (alpha)
• False positive result
• Rejecting null hypothesis when it is actually true
• Set for a single comparison at p<0.05
Risk of type 1 error calculation
Calculated by:
[1-(1-a)^n] where n is the number of tests
Type 1 error rate of <0.05 is accepted for a single test
Inappropriate for multiple tests
Multiplicity in clinical trials
5 examples
Multiple treatment
• More than 2 groups (drugs, doses, combinations)
Multiple endpoints
• Several outcomes of interest
Repeated measurements
• Measurements at multiple time points
Subgroup analyses
• Tests whether individuals with certain characteristics benefit more than those without (e.g. demographics, lifestyle)
Interim analyses
• Analysis of data that is conducted before data collection has been Conducted during the trial e.g. for ethical and economic reasons
Dealing with multiplicity
- Make less comparisons
- Pre-define/ prioritize the comparisons
- Adjusting the p value
Make less comparisons - dealing with multiplicity
- MULTIPLE TREATMENTS use analysis of variance (single omnibus test compares all treatments ar once rather than making multiple comparisons)
- MULTIPLE ENDPOINTS
use single summary statistics e.g. questionnaire many questions but one score - MULTIPLE ENDPINTS use composite endpoint
e. g. MACE - occurrence of any fatal heart problem like stroke, heart attack etc all counts as 1 - REPEATED MEASUREMENTS
Do analysis at predefined timepoint
or do summary measure - area under curve for example
or use statisical mixed model
pre-define. - dealing with multiplicity
Multiple treatments
– Pre-define the most important comparison
Multiple endpoints
– Specify primary and secondary endpoints in advance
• Study is powered to detect primary endpoint and outcome judged on the significance of the primary endpoint
Subgroup analyses
– Predefine a limited number of subgroups to be analyzed
adjust the p value - dealing with multiplicity
Type 1 error rate is inflated by multiple tests
– Reduce the p value threshold for individual tests
– Overall level of significance can be kept at 0.05 for entire series of tests
e.g. Bonferroni correction
– divide 0.05 by number of tests done to set significance level for each subtest
– e.g. for 5 related tests set a (risk of false positive result) at 0.05/5 = 0.01
– Very conservative, tends to overcorrect and increase the risk of a false negative result
No significance testing for baseline data
can avoid to reduce multiplicity
Multiple tests will generate false positive results
• e.g. 30 comparisons; 79% chance of false positive result
Differences may be clinical important but not statistically significant
• negative tests may be falsely reassuring
Comparisons not testing a useful scientific hypothesis
repeated measurements
Outcome variable measured two or more times for each participant over a period of time
e.g. before, during ad after
How to compare repeated measures between groups?
Compare final measurement?
• Wastes a lot of valuable information
Compare every timepoint?
• Multiple comparisons – risk type 1 error
Some kind of regression?
• Correlation structure leads to bias
Summary measure approach?
• Each measure has limitations
WE CAN USE REPEATED MEASURES MODEL (ANOVA) or summary measures